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ABSTRACT 


This thesis has two parts, both related to the develop- 
Meme OL Smart sensor systems. The first part is a theoretical 
development of two families of adaptive spatial filters for 
suppressing background clutters in infrared images and based 
on the minimization of mean- squared error or the maximization 
of signal to noise ratio criterion. Seven different nonlinear 
search techniques have been developed for the adaptation pro- 
cess. They have been applied to two real world infrared test 
images and exhibit fast convergence rate with no misadjust- 
ment. The second part is an experimental development of a 
multiple microcomputer system which can be a candidate for an 
On-board processor system. A multiple star, multiple cluster 
architecture was developed whose intercommunication is managed 
mameeeenrece level control including central controller, dis- 
merputced controller and random priority controller. The 
Ma@aptive spatial filter has been successfully implemented on 


this system using partitioning for parallel computing. 
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I. INTRODUCTION 


SO SJECTIVES 
imeeebual Objectives of this Thesis 

This thesis consists of two closely related studies. 

a. The first study is the theoretical development 
of adaptive image processing algorithms for enhancement of 
"target signal" to "clutter noise" ratio in images. It will 
be used in the first step of a multiple-stage image process- 
ing program for detection of dim targets in noisy infrared 
images. 

b. The second study is an experimental development 
of a multiple microcomputer system for implementation of these 
adaptive image processing algorithms. 

These two studies belong to two different technical 
areas. Either topic could be the subject of one thesis pro- 
meer. However, they are investigated together in this thesis 
because of the special nature of a new emerging field which 
inspired the research undertaken by this project. This new 
field is sometimes known as the ''Smart Sensors" [1, 2, 3]. 
Its developments got into high gear only in the late 1970's 
when advances in two integrated circuit fields, VLSI digital 
electronics and mosaic optical sensor arrays, were joined 
together to develop new optical sensors which also have 


sophisticated on-board signal/data processing capabilities. 
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In other words, they are SMART-SENSORS. Their importance is 
closely associated with the coexistence of "sensing" and 
"processing" capabilities on a small volume, light weight, 
low power platform. Therefore, the successful development 
‘of "smart sensor" systems includes not only new signal/data 
processing algorithms to provide the needed "smartness" but 
also efficient implementation by signal/data processors whose 
size, weight, power and performance are compatible with the 
requirements of on-board equipment in many practical military 
systems. 
Pee ul¢ci-Dimensional “Smart Sensor" Signal Processing 

In most optical smart sensor systems, signals of 
interest are in the form of images. If the field of view 
of the sensor platform is nat stabilized, or locked onto a 
target, successive frames of images are not registered. 
Signal processing can only use single frames of an image. 
Therefore, the signal is two dimensional in terms of the 
Spatial variables x and y. If sensors in several spectral 
bands are available and well registered spatially the sig- 
nals are three dimensional in terms of variables, x, y and X. 

In many other smart sensor systems, the field of 
view of the sensor platform either does not change (as in 
a synchronous orbit satellite with staring sensors) or is 
Stabilized (as in aircraft: with step-staring sensors) or 
is locked onto a target (as in missiles after they have al- 


ready acquired a target). In these cases, successive images 
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are registered. Both single frames of images and multiple 
frames of images are available for signal processing. The 
wom is then three dimensional in terms of x, y and t. In 
memeeeton, if multi-spectral sensors are registered, the signal 
memcour dimensional in terms of x, y, t and \. 

Therefore, signal processing operations required for 
smart sensors are often multi-dimensional. This thesis is 
concerned with adaptive spatial filters processing infrared 
images. This type of spatial filter should be distinguished 
from the majority of image processing methods which are con- 
cerned with the image itself as the signal of interest. 

Our primary goal is concentrated in the targets. The image 
meselt, often called the background clutter, is considered 
as noise and must be suppressed so that dim target signals 
can be revealed to allow the application of a threshold to 
Mimeoaatce the detection process. In addition to the clutter, 
the image may include other noise and man-made interference 
and jamming also, which are all treated as noise. Only 
targets are considered as signals. 

Seeeuubciplc Stages “Smart Sensor” Signal Processing 

Forwacecomplish the objectives of most smart sensor 
systems in detecting, tracking and recognizing very dim 
targets deeply buried in noise, a multiple stage image pro- 


cessing approach is generally needed (Table I.1). 


ie 
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IMAGE PROCESSING STAGES 


Objective in 
Various Stages Processing 
: Hard Limiting 
Beet eats IOS Adaptive Filtering 


Adaptive threshold 














Detection Threshold 








Target Acquisition 





Kalman Tracker 





Tracking Post-threshold 
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Recognition 


For more detail, see Chapter III.B.2. 


This thesis will concentrate on the development of 


new adaptive filter techniques which will be used in the 


"Enhancement" stage to improve the "target signal" to 


“"pbackground clutter noise" ratio by either suppressing the 


background clutter or enhancing the target signal, or both. 


STATISTICAL IMAGE PROCESSING TECHNIQUES FOR 
ENHANCEMENT OF "TARGET SIGNAL" TO "BACKGROUND 
NOISE" RATIO IN INFRARED IMAGES 

ie Lutroduction 


Although the responsibility of detecting very dim 


targets is shared by several steps of image processing in 


pre-threshold, threshold and post-threshold stages, the "en- 


hancement" step before thresholding plays a very important 


role because it is necessary to improve the ''target signal" 


to "clutter noise'' ratio to approximately one before a 
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Bareshold operation can be applied. Otherwise, there will 

be too many false alarms collected by the thresholding step, 
which makes post-threshold signal processing difficult. 
Therefore, in theoretical developments of new image process- 
ing techniques for smart sensors, a great deal of attention 

is given to background clutter suppression techniques for 
enhancement of the signal to noise ratio before the threshold- 
ing step. 

We have made a survey of these techniques and present 
mm@emein several classifications in Table I.2. First, they 
are classified as nonadaptive, open loop adaptive and closed 
loop adaptive. By "nonadaptive,'" we refer to those approaches 
whose filters are not designed by using the image character- 
mes. However, in two adaptive cases, the filters are 
tailor-designed based on the characteristic learned from the 
images being processed. In the open loop adaptive case, the 
mercer is not able to update or correct itself when the char- 
acteristics of the image are changed. The image properties 
must be "'relearned" before a redesign of the filter can be 
made. In the closed loop adaptive case, a feedback process 
1s provided between the filter output and the input to the 
design process. In this way, any change in the image char- 
acteristics will result in an increase of the output error 
which is used to automatically update and correct the filter 


design. 
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TABLE Mee? 


FOCAL PLANE PROCESSING TECHNIQUES FOR BACKGROUND CLUTTER SUPPRESSION 


FOCAL PLANE PROCESSING ALGORITHMS ACTIVE GROUPS 


1st order, 2nd order (Laplacian) ; . 
SPATIAL 4th order nonrecursive spatial eee 
filter 


Frame to frame differencing: 


(Nonrecursive temporal filter) 
EEO | Ist and 2nd differencing 
Ird differencing 


Three dimensional spatial-tempora} 
filter by variational method Rockwel | 


Pseudo-reticle nonrecursive spatial 
filter followed by recursive tempo- 
ral bandpass or highpass filter 





NONADAPTIVE | DETERMINISTIC 
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TEMPORAL 
















Optical 
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SPATIAL- Nonrecursive spatial filter followed | MIT Lincoln* 
SPECTRAL i ee two color discrimination Laporatory 











ee oa normalization General* 
(Localized adaotiv2 threshold) Electric 


Gandpass filter followed by adaptive j;Aerojet * 
threshold ElectroSystems 


end, 3rd order recursive temporal 
highpass filter 


Minimization of mean square error: 
| Recursive Kalman filter (spatial) Samat: 








SPATIAL 
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TEMPORAL 





OPEN LOOP SPATTAL Nonrecursive Wiener filter(soatial [Lockheed = [Lockheed =| | NPGS | 
ADAPTIVE Maximization of signal to noise ratio 
e 
Nonrecursive spatial match filter ees 







| Maximization of Likelihood ratio of Likelthood ratio Aerospace Corp 


STATISTICAL 








ae ee Of mean square error: 
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Maximization of sianal to noise ratio NPGS 


Maximization of sicanal to noise ratio NPGS 


Minimization of mean square error: 
Nonrecursive spatial filter 


Maximization of stens!] to noise ratio NPGS 


Minimization of mean sauare error | NPGS {> 


Maximization of signal to noise ratio 
NPGS }-. 


* Techniques developed for tactical systems 


Nonrecursive temooral Wiener filter] Lockheed 






Recursive tempor3] Kalman filter 


SPATIAL= 
TEMPORAL 
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These approaches are further classified as determin- 
[rnc and statistical. In deterministic cases, the filter 
design is based on non-statistical properties of the image, 
mueeas its frequency characteristics. In statistical cases, 
the filter design is based on statistical properties of the 
image, such as its autocorrelation or power spectral density. 

Furthermore, they are classified according to the 
types of signal processing operations used: spatial, tempo- 
ral, spectral or some of their combinations. 

fee Open Loop Adaptive Filter 

In our research group, several nonrecursive adaptive 
open loop adaptive filters have been developed. D. Bar 
Yehoshua [4] first developed the nonrecursive statistical 
Spatial filters designed by a minimization of mean squared 
error criterion using theoretically generated images based 
on both the first and second order Markov models. These 
images are all assumed to have zero mean. OD. Hilmers [5] 
extended these spatial filters to process real world images 
which have non-zero mean. Further, he extended the same con- 
cept to nonrecursive statistical temporal filters. B. Evenor 
[6] made two additional extensions. First, he developed the 
design procedures for spatial filters based on the maximiza- 
tion of signal to noise ratio. Second, he developed a closed 
loop adaptive spatial filter by extending the LMS (least mean 
Square) algorithm used by many one dimensional adaptive filter 


researchers. It will be discussed further in the next section. 
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Using several real world infrared test images, these 
Open loop adaptive filters have been found to be very effective 
ieesuppressine background clutter for point targets. However, 
they are not responsive to any change in the characteristics 
of the image being processed. 

Emeeclosed Loop Adaptive Filter and this Thesis 

The realization of this lack of true adaptive capabil- 
fevered to the study of B. Evenor [6] who developed the non- 
recursive closed loop adaptive spatial filter based on the 
"LMS" algorithm, and tested this approach by theoretically 
generated image using Markov models. However, it was dis- 
covered that the LMS algorithm is actually a simplified version 
of a more general and powerful family of closed loop adaptive 
filters. It was decided that the first part of this thesis 
would be to develop such a general adaptive filter approach 
Noten includes: 

meuwo Optimization criteria: 


Minimization of mean square error 
Maximization of signal to noise ratio 


General adaptation equation using gradient search 
models 


A family of nonlinear searching techniques to carry 
out the adaptation process. 


The details of this theoretical study will be presented in 


Chapter II. 
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C. IMPLEMENTATION OF THE IMAGE PROCESSING PROGRAM 
By A MULTIPLE MICROCOMPUTER SYSTEM 


i introduction 
Peparallel erfore has been made in the investigation 
meeoractical implementation of these statistical nonadaptive 
image processing algorithms developed in our research group. 
feetilimitzas [/} first investigated the execution speed and 
accuracy of these image processing algorithms on a main frame 
computer, IBM 360/67. 
2. Microcomputer Implementation 
D. Becker [8] investigated the performance of imple- 
mentation of the nonadaptive image processing algorithms on 
one 16 bit LSI-11 microcomputer and a combination of this 
woe-tl microcomputer and a microcomputer compatible CDA-MSP-3 
array processor. It was found that using high order language 
programming and floating point data format, today's microcom- 
puter implementation is still in its infancy. Its execution 
speed is slow and not anywhere near any real time processing 
requirements. Improvements in microcomputer implementation 
by using assembly language programming, integer data format 
and improved programming on array processor are currently 
being developed. 
> uheohommerocompucer Implementation and this Thesis 
It 1s obvious that to achieve real time image proces- 
Sing performance using microcomputers, several improvements 


should be considered simultaneously. First, the processing 
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capability of individual microcomputers must be improved by 
more imaginative programming and by using attached special 
processors, such as the array processor. Second, and prob- 
ably much more important, is to take advantage of the rapidly 
increasing number of microcomputers affordable in a system 
Pyeeleverly orchestrating them into an effective concurrent 
parallel and pipeline execution of the whole image processing 
mageram. The advantages offered by the type of multiple micro- 
computer approaches do not stop at faster execution only, but 
also include multi-tasking, higher reliability because of 
bemecer fault tolerance. It was decided that to fully meet 
the needs of new research for the successful development of 

a smart sensor, a second part of this thesis should address 
the implementation issue of image processing algorithms by a 
multiple microcomputer system. Its details will be presented 


fmimenapter III. 


eeeescOPE AND EXTENSION OF THIS THESIS 

It should be strongly emphasized that although this thesis 
Specifically developed a family of adaptive spatial filters 
for the enhancement of target signal to noise ratio of images 
and a multiple microcomputer system for the implementation 
of the image processing, the motivation of this thesis is 
to contribute to the development of smart sensor systems. 
Therefore, the adaptive filter concepts and design techniques 


eeeenoc limited to spatial filters only. They can 
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memmmeadiiy extended tO a wide class of problems of poor 
Seema fo noise ratios. fhe implementation issue is not 
imemneed £O adaptive filter processing only. The multiple 
microcomputer system is designed to implement not only the 
mission signal processing but also a host of other signal/ 
data processing tasks for management, command, control and 


meommunication functions. 


ay: 





II. ADAPTIVE IMAGE PROCESSING 


A. INTRODUCTION 
1. General 

The idea of an adaptive filter is inherently attrac- 
tive. It does not take any stretch of imagination to see a 
myriad of advantages offered by an adaptive filter which can 
automatically update itself when it is not performing accord- 
ing to an optimum criterion. The development of adaptive 
filters started in the early 1960's when it was extended 
from the sampled data control system [9] and when it was 
developed for adaptive antenna applications [10]. In ensuing 
years, a large number of investigations were made for appli- 
cations in antennas [11], noise cancellation [12] and a 
variety of filtering applications [13-48]. 

It is natural that adaptive filter concepts are very 
attractive for the objective of this thesis--to detect very 
dim targets deeply buried in infrared background clutter. 
However, a survey of adaptive filter research published in 
the 70's reveals the following facts: 


eee ractically all of the past adaptive filter 
- research dealt with one dimensional problems. 


b. LMS (least mean square) error has been the most 
widely used criterion. Very little attention has 
been given to other criteria, such as the maximi- 
zation of output signal to noise ratio. which is 
probably better suited for detection problems. 
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fevety little attention has been given to the convergence 
speed issue of adaptive filters. 


(ietononmeomycecccided tO dddress these three issues 
and develop new adaptive image processing techniques which 
are multi-dimensional, using either the mMSE (minimization 
of mean square error) or the MSNR (maximization of signal to 
noise ratio) criterion, and using a family of nonlinear con- 
vergence techniques developed in the optimization field to 
search for the extremum in the adaptive process. 

However, the basic concept of the adaptive filter 
and the traditional LMS approach will be briefly reviewed 
first as a Starting point to introduce new techniques devel- 
oped in this thesis. 

me basic Concepts of Adaptive Filters 

The basic concepts of an adaptive filter can be 
described concisely as follows: 

iieminbeter B.S irepresented by a vyeetor H. In an 
seeaptive filter, H is updated in successive iteration steps 
described by a subscript as Hy H . <A correction tern, 


Kor Il 
AHL, 1s generated in each iteration step such that 


H =H, Foe 


BG Vy eK eK a 


The iteration steps are carried out to optimize a selected 
performance function until the filter converges to its steady 
state which also corresponds to the reaching of an extremum 


of the performance function surface. 
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ice eecctlamvcedetemporal filter or a spatial 
filter. It could be a recursive filter, also called infinite 
impulse response (IIR) and zero/pole filter, or a nonrecur- 
Sive filter, also called finite impulse response (FIR), and 
gebezero filter. 

The performance function could be the mean square 
panore, Or the output signal to noise ratio, or other func- 
tions such as the likelihood ratio. The optimization objec- 
tive could be either the minimization or maximization. 

In this thesis, two dimensional spatial filters are 
considered. They are the nonrecursive type. Two types of 
Seot functions are used. Their optimization objectives are 
shown in the following table. 


> TAbiEE 11.0 


OBJECTIVE FUNCTIONS 






Adaptive filter Performance Function Optimization Goal 


MSNR Output Signal to Maximization 
Noise Ratio 


Let us consider a nonrecursive spatial filter of a filter 









area of 3 by 3 pixels which has nine filter coefficients. 

The cost function is a surface in a nine dimensional space. 
The goal of the iterative adaptation procedure is to search 
for the coordinates (filter coefficient space) for the extreme 
point (either a minimum or a maximum) of the performance 


Hunction surface. 
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3. Traditional Approach - LMS Algorithm 


An overwhelmingly large portion of the past adaptive 
filter studies followed the approach originated by Professor 
B. Widrow [14], and commonly known as the LMS (least mean 
Square) algorithm. 

The performance function used in this approach is 
em mean Square error." The optimization goal is "minimiza- 
mom. Prof. Widrow proposed that the adaptation term AH be 
Seeenessed as: 


AH = 2UEX 


Wile des X Signal being processed 


2u = a constant, called adaptive gain 
T 


€ = adaptation error =, d- HX 
d = reference (or desired signal) 
Deere ce GEcOcriIGicnt VECEOr, 


The adaptation equation is then 


Aad] 7 Hy ~ 2ueX 


meocecepest descent search technique is then used for per- 
forming the adaptation steps. 

Although this traditional LMS approach has been used 
by most of the adaptive filter researchers, it is not without 
certain drawbacks which will be briefly described as follows. 

The adaptation equation used in the traditional ap- 
proach can be considered as a special case of a more general 


adaptation equation, 


SE 





Hey = Hy * MR Sx 


an equation commonly used in the field of optimization. 


iaeemecrm G, 1s sometimes called the "gradient" meaning the 


K 
gradient of the performance function surface. The term OL 
is sometimes called the "step size'' meaning the displacement 
Baeewe VCCtOr space H. The optimization procedure at itera- 
meomestep K+l gave a rtilter vector Aya] which is closer to 

miemoptimal vectar H* than previous filter vectors. There- 

fore, Prof. Widrow's imaginative proposal can be interpreted 


as the following two assumptions: 


Gy <> ZeX 


(eer ce jt = “ae COnstane. 


K 
These two bold assumptions probably have resulted in several 
maoerent limitations. 

a. Because the gradient Gy is not tailored to the 
performance function, convergence could be slow. Further, the 
steady state filter result may not yield the best estimation. 
Possibly, a steady state misadjustment could exist [24]. 


b. Because the step size a, is assumed to be a 


K 
constant, the adaptation procedure may never reach a steady 
Beate. 
4. This Thesis Research 
In view of the results of the survey and review of 


the status of the adaptive filter approach as presented above, 


we identified a series of research problems which must be 
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investigated in order to develop adaptive image processing 
techniques for suppressing background clutter in infrared 
miaees and for helping the detection of dim targets. 

First, we must extend the one dimensional adaptive 
filter techniques based on the mMSE criterion to two dimen- 
sions. 

Second, we should develop an adaptive filter based 
on the MSNR criterion which is presented in section B. 

Third, we should develop a new adaptive equation 
which is more responsive to the performance function in order 
to improve convergence speed and to minimize steady state 
misadjustment. In other words, the adaptive equation is in 
the form of 


Heyy ~ Hy + Oy Gy 


The step size a, and gradient G, will not take the form of 
2u and eX aS is customarily done in practically all of the 
past adaptive filter studies based on the LMS algorithm. 
Fourth, we will investigate a variety of non-linear 
gradient techniques to search for the minimum in the case 
of mMSE filter and the maximum in the case of MSNR filter. 
They are derived and presented in sections C and D, ore 
tively. 
The results of applying these adaptive spatial filters 


to two real infrared images will be presented in section F. 
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Peeve RIVATION OF OPTIMIZATION CRITERIA 
1. Performance Function I - mMSE 

The performance function based on the mMSE criteria 
is derived along with the nonrecursive spatial/temporal filter. 
The nonrecursive spatial and temporal filters are described 
Mememeoct OL filter coefficients, vector H over the area of a 
" search-box", The observed signal in the "ith" "search-box" 
meenepresented by the signal vector X.- The estimated target 
intensity within the search-box oF 1s obtained by the linear 
meter 

Cee (2.00) 
1 —- =i 


This process is carried out throughout the whole image. 


The nonrecursive filter is represented by the vector 
Ht =H (1); H(2), 242, H(N)] C2501) 


where N is the number of pixels in the filter "search-box". 
mace image signal within the "ith" filter "search-box" is 
meocribed by the vector: 


A 
X= (Xj, (1), Xj (2),---, X, 09) (2.02) 


maroughout this thesis, Naeilcesmviliebecadenoted by a ''=" 
under the symbol. Vectors will be denoted by a "_" under 
the symbol. 


The estimation error is defined as: 


cee Fulewe2 .0 


: T denotes the transpose of the vector. 
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AG 
ee (2505) 


where S5 1s the signal and a the estimated signal in the 
Soearch-box". 

The mMSE (minimization of mean square error) per- 
meemance function is defined as: 


Ve 


JA Eee (2.04) 


1 
where E[-] denotes the expected value. Substitution of 


merou) and (2.03) into (2.04) gives: 


Peas T. — ey 
a E( (8 x; S;) (Hd X; S;) ] 

Seer eceeeS | (2.05) 
— -i-i — *i"i i 


Bearec the filter value is fixed for an image, it can be 
momed Gut of the expectation operation to give: 


T 


- yl ; T 2 
J = HE(X,X, JH - 2+H-E[X,S,] + E[S/] (2.06) 


In order to simplify (2.06), the following terms are 


defined: 
(1) The autocorrelation matrix Ryy of the observed image 
aS 3 a 
A T 


mame a correlation matrix, it 1s a symmetric and positive 
Gerinite matrix. 


(2) The cross correlation vector between the observed 
Signal and the target signal of interest is: 


A 
Rys = E[X;S,] 2705) 
—))) The mean square value of the target signal is: 


q 2 E[S, ] (2.09) 
AS 





Supstitution of (2.07) through (2.09) into (2.06) gives: 


el cee 
J = HRyyH - 2H Ryg + d C0) 


Fouation (2.10) is the performance function of the mMSE 


@emceria, It 1s a guadratic function in terms of the filter 












meecor H. 

a Tor 
Estimation = 7 en 
pace 1 7 i | Sea 


Pugiire #209 Scares Dox. 


Theorem 2.01 
lines pereEotmance tunectaon. (2.10) 15 a unimodal (1.e., 
has a single minimum) function if the autocorrelation matrix 


Ryx 


msepositive definite. 
Proof 
The stationary points of the function (2.10) are 


Poumd by setting the gradient of (2.10) with respect to H 


moe Zero. 
= x. = 
ee Sees Ryo) 0 C2) 
Since Ry y 1S a symmetric positive definite matrix, its in- 
verse exists. Therefore 
H* = ROL-R eZ) 





mouation (2.12) is the optimum filter vector which minimizes 
mmemwcost function (2.10). In order to prove that the cost 
Mimectolm 1S Minimized for H*, the second gradient of (2.10) 


heen respect to H is taken. 
ee = Ryy (215) 


Since Ryy 1s positive definite, the cost function is mini- 


mized. The minimum value is 


eg oR Re R 


min —XS ~X —XS (2.14) 


It is obtained by substituting the optimum filter vector 
feeecO (2.10). 

The second derivative of the cost function I, as 
described in (2.13), is called the Hessian matrix. 

Eee the aucOocorrelation matrix is Singular, the cost 
function (2.10) is no longer unimodal because (2.11) can be 
meemeco Zero for an infinite number of filter vectors H. 

It can be shown [49] that for such a case, a minimal 
solution can be obtained [50, 51] by using the pseudo inverse 
of R 


~XX" 


ark 


H* = (R 


T 
Ryy RyxJ R R 


~XX —XS 


The solution is not unique. 
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2. Performance Function II - MSNR 
iNewobsemvcams tonal jn the “search-box” is repre- 
memecd by the vector X. Let us assume that the target signal 


foermor o and the clutter noise vector N are additive: 
eo Nl OZ) 


peeing the linear filter H to the input signal vector X, 


we obtain: 


[= 
| >< 
I 
| x 
—~ 
[wm 
+: 
= 
= 


. ii Gata 


2 = eels Siyinelle Elsiese seating ayy 2k?) 


N, 4 HN 


iy 


clutter noise after filtering (22218:) 
iaemoutput Signal to clutter noise ratio is then defined as: 


A The Power in the filter image HX due to target signal 


i — . (2219) 
The Power in the filter image HX due to clutter noise 
E[S“] 
min — > C2210) 
E(N, J 


Where E[-] denotes the expected value, substitution of 
met?) and (2.18) into (2.20) gives; 
Tey2, Tees 
El (eo Sym EL Hess 1} 
SO —- CZecek } 
T,,, 2 ee 
E[(N) J E(H NN 4] 
iimemeilter vector H can be taken out of the expectation 


operation. 
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Let us define the signal autocorrelation matrix as: 


4 T : 
Rego E{SS’ J (2525) 


made the clutter noise autocorrelation matrix as: 


A a 
Run E[NN’ ] (2224 } 
Ran and Roo are symmetric and positive definite. Substitution 


mms. 25) and (2.24) in (2.22) yields: 


(2.25) 


ifeeperformance function J in (2.25) is the performance 
funetion of the MSNR criteria. 
iiceuuewevcClOmeumiss Obtained by maximizing J an 


mee) with respect to the filter vector H. 


Theorem 2.02 
The maximum of the objective function (2.25) is 


equal to the largest eigenvalue of the matrix Ruy * R and 


NS 
Macewoptimum filter H* is the corresponding eigenvector. 
The proof is based on the Cauchy-Schwarz inequality 
by finding the upper bound of J. 
Since the autocorrelation matrix Run 1s symmetric 
and positive definite, there exists a square nonsingular 


matrix V which satisfies the relation [52]. 
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ot 
Run = VY 


PM@iestitution of (2.26) into (2.25) and using the fact that 


7 ‘| 
eel 
mee A 7 
gives nae - 
ye o MN Rss VIE 
eee 
HO WN 


Let us define the normalized vector W as: 


s YH ya 
Y= 


a val 
(VH) “+ (VH) v 


fen also satisfies the normalization condition, 


WwW 1 


substitution of (2.29) and (2.30) into (2.28) gives 


cll ak 


T OT 
ssi NW 


Ww V R 
—- % Nv 


q, 
I 


Let us define the matrix P as: 


pAyt 
AV VY 
Equation (2.31) becomes 


J=W 


[= 


Using the Schwarz inequality, we obtain 


w) - (wip! pw) 


Since the left side of the inequality is equal to J 


(De 


CZs 


(2. 


Ze 


ee 


(G2 


ae 


ee 


ey 


the 


26) 


27) 


28) 


Z9)) 


30) 


orale) 


52) 


55) 


34) 


iment side of (2.34) is the upper bound of ae The performance 
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function J reaches its maximum when the equality holds, 
which occurs when: 


Wea: PW i (2.35) 


fmene © is a constant. cubstituting. (2.29) and (2.32) into 


foo) «Obtains (2.56). 


H 


“(VH)T (VH) “(VH)T (VH) 


~e 


I< 


VH (2,58) 


Maetipiying (2.36) by 


1 -vi i, CrD) 
"(VH)? (VH) 


we obtain: 
vive H=a-+VV «RV 7VH (2.38) 


Swestituting (2.26) and (2.27) in (2.38), we get: 


Since Ran LS DOSLtive definite Matrix, 1€S inverse Ran 


meersts. Multiplying (2.39) by = : Re , we obtain: 


rms 


(R Rog 


i} Pel. - ux = 
en +I) + H* = 0 (2.40) 


where I is the identity matrix. 

Equation (2.40) is called the generalized eigenvalue 
eigenvector problem [52]. 

Sct ei iP mamomm ce4Ov ImMeEO (2525), we obtain 


the maximum value of J. One can see that 
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b2541) 


re 1 
J Bea a 4 max 


Max 


imeother words, ae is the largest eigenvalue of the matrix 


Be Rog; and H* is the corresponding eigenvector. The noise 
Sorrelation matrix Ryn can be obtained by assuming some target 


feat Of interest S and using the observed signal X in the 


following way (the signal and noise are assumed additive). 


T 
R Seed (A > 2) (X= Ss 
Ruy = EL(X- S)(X-S) ] oe 


iieerem 2.03 

The performance function J in (2.25) is in general a 
multi-modal function. 

Based on theorem (2.02), the stationary points of the 


performance function J satisfy the eigenvector equation (2.40): 


zl Bele k = 
Nesceeo oY 


In general, this equation has n different solutions, because 
the matrix .. Rog Pine Rene aimnaVenmmatstinct: clgenvalues , 
and thus n corresponding eigenvectors. So, in general, the 
performance function can have one absolute maximum and n-l 
local smaller maxima. 
imeorem 2.04 

The performance function J is a unimodal (has a single 


maximum) if the matrices R 


and (2.24). 


didmhew ake adebined as in (2.235) 
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Proof 


The proof is based on the fact that Reg 1s a dyad. 


Wee, ecuation (2.40): 
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=i 1 _ 
NSS Gj aet? ctf ~ 9 


The matrix R being a dyad, can be written as: 


9S’ 


a 


Recs et (22 2) 


“SS 
where e 1s a vector. 
As mentioned before for the nontrivial solution of 


(2.40), the performance function 


a 
J = 5 (C745) 
Using (2.42) and (2.43) in (2.40), we obtain 
es T . oye = x 
Rw ° EE H JH (2.44) 


‘Separating (2.44) into a product of a vector and a constant, 
we obtain: 


(Rokr) eT H*) = 0 +H (2.45) 


For generality, a constant g can be used in the left side 


aa 2.45) to give: 


(8+ Ryy (EC ger H*) = J +H (2.46) 


Comparing both sides of (2.46), we get: 


-~ 1. ,Tye 
Imax 7 GLH cea 
Bee oiemR ee cc) (2.48) 
— uNN = : 


Equation (2.47) shows that if Roo 1s a dyad, the 
performance function J has a unique stationary point 
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Piewe it reaches its maximum. The general eigenvalue problem 


has a single non zero eigenvalue Jn and a corresponding 


ax’ 


. -] 
eigenvector: H* = BeR oC, 
& 2 ~NN 


(Om Die) 


C. DERIVATION OF SEARCHING TECHNIQUES FOR EXTREMUM: 
GRADIENT SEARCH METHODS FOR THE MINIMUM OF THE mMSE 
PERFORMANCE FUNCTIONS 


imeeoceepest Descent Method (SD) and 
Plcmbestmoeep Adaptation Gain 


Pies oteccpCst descent Metnod 15 a gradient method 


~ 


fame uses the Jacobian gradient (G = VW) of the performance 
function J to determine a suitable iimeeedon of search. Grad- 
1ent methods which use the Jacobian to determine the direction 
Search are called first order methods. Gradient methods for 


Optimization are based on the Taylor expansion of the per- 


formance function J, as given below: 


a - 


J(H+ AH) = J(H) + GAH + SAH A+AH (2.49) 


fiero G is the Jacobian gradient of J and A is the matrix of 
second order partial derivatives called the Hessian matrix. 


Equation (2.49) can be written in the form: 


J(H+ AH) = J(H) + AJ (2.50) 


Mieesceepest descent uses only the Jacobian, so 


AJ = GleaH (2.51) 


itteorder tO Minimize the performance function J, we want to 


generate a descending sequence of J which finally converges 
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to the minimum of J, J*. In other words, we want a negative 


meee Dut: 


ee ee eon 














AH 








mere » is the angle between the two vectors, G and AH. 
For maximum reduction of the cost function J, ¢ =T (2252) 
CROimec. oat tomobViems Chatethe change AH in 
mmemnditer vector H should be in the direction of the nega- 
Megemeradient - G. This direction is called the steepest 
@escent direction. 
Mice mecpestmacscent. step AH can be written in the 
form: 


ie rea o,.° ee) 


meme -G is Called the step direction gradient and o the 
step size. In adaptive filter terminology, a is called the 
adaptation gain. 

In order to generate an iterative method, one can 


menecent the filter vector H + AH as Hy and H as H 


i k° 
Thus, 


H = H, + AH (2.54) 


MUS LeUtImem( 2.95) 1m (2.54), we obtain: 


Hiker ~ Hq > Ox * Sx (2.55) 


Equation (2.55) is called the steepest descent iterative 
method. For simplicity and without losing generality, the 


negative sign will be included ina 
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K Thus (2.55) becomes: 





Beek * OK Sx eo) 


If very small values of Lay) are selected, the sequence {Hy} 
fimeeconverge very slowly. In order to increase the speed 
of convergence substantially, we chose the step sizes which 
provide the biggest descent each step. This concept is 


@aiebead the “best step''. The adaptation gain a, 1S picked to 


K 


minimize J(Hy,4)- TietSechno mec sou: hy constitutes a one dimen- 


Sional minimization of the performance function J(Hy 4): 


Lemma 2.05 

inet J (Hy) be the performance function to be minimized. 
meeechne filter vector Aad 
method (2.56), then the "best step" towards the minimum of 


be updated by the steepest descent 


mers Obtained in every iteration if the adaptation gain 


Satisfies the relationship: 


T _ 
Ge ieecre ae se 


where G, is the Jacobian gradient of J with respect to the 


fmeeer vector H. 


Proof 


The performance Pattee eee it ) can be expressed as: 


K+1 


et = J(H, +s [Ou aG 


ee (258) 


a. 


The task is to find a, which minimizes J (Hy) by setting 


K 


Bgemaerivative of J(H,,,) with respect to a, to zero. 


Kr 
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ance 
ss = (2.59) 
K 
but Aya Sea erunet 1 On. OL Gy as Shown in (2.56). Thus 
(2.59) becomes: 
¢ sot) Ak+ 1)! a (Vis ” ‘ d (He 44) ano (C260 ) 
OL x —K+1 da, 


(2.60) becomes: 


Z Ge CHer ) ae, 
Since Gray = Vii J and do, 6" = Gy, 


—K+1 


C t 


Cec 0 (2.61) 


—K 


Poem (2.61), the best step concept requires orthogonality 


and G 


Beuween the two gradient vectors, G Gy. 


K+1 
(One w) 


DoeteOmentSepOoIme, the Cost function J was not speci- 
fied, and the derivation of the steepest descent was made 
for any continuous differentiable function. 

The mMSE performance function as given by [2.10] 


Can be written as: 


at at 
Ja = Hy Ryxbe > 2H Ryg * 4 (2.62) 


mie gradient G, of Jy Wihbiemespeet tO H- 1s given by: 


K K 


A 
Ge ae 
“K ~ "H,*K 


= 2(R - Ryo) (25705) 


XX a 


Poome(2.63) and (2.56), can be expressed as: 


Gray 
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ere Br Aa Exe? Celt) 
Reece ene ox) ~ By? 
Seaman xs) ~ XX °K ” 2K 


= Ge + dan * Ryy * && 


Lemma (2.05) is used in (2.64) to compute the best 


Seep Ohi 


Since G,/,+ G, = 0, Lemma (2.05) 


K+l —K 


(G. ues. °K 


K Ryy Gy) = 0, see (2.64) 


G, = 0 (2553) 


Ryx 


mrroetact in (2.65), we obtain 


1S a symmetric matrix, thus a = R Using 


~XX° 


T 
1 & && 


a —_ - ) cece cc pment 
K z T (2.66) 
Ge Ryy Sx 


Eawation (2.66) is the equation of the best step for the 
Steepest descent method. 
Combining the results from (2.56), (2.63) and (2.66), 


we obtain the steepest descent adaptive filter: 


feeep 1] Set a starting filter vector Ho » SeOoppime DOUnd {1.e. , 
Max. acceptable adaptation error) €, the correlation matrix 
Ryy of the observed signal, and the cross correlation 
vector Ryo: 
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moeep 2} Compute the gradient: 
Om eg. XG) 


[Step 3] Compute the adaptation gain: 


T 
Gy G, 


a a 
Gy Ryx Sx 


ho] 


mak 


feeep 4} Update the filter vector: 


eae ok 2K 


[Step 5] Test for stopping condition: 


fle * Gy < €, then terminate. Otherwise, 


moeco step 2. 


The stopping criteria is chosen as Gy 


the performance function is unimodal (has a single stationary 


G, < € because 


point), and we are looking for the stationary point which in 
fact satisfies the vanishing of the gradient. 
feeenecelerated Steepest Descent Method (ASD) 

The accelerated steepest descent method was first 
introduced in 1964 by Shah, Buehler and Kempthose [ 53 ] 
Its purpose was to accelerate the convergence of the standard 
steepest descent method. Its concept was incorporated in an 
algorithm which converges to the minimum of any n dimensional 
quadratic function in no more than 2en-1 steps. Practically, 
this algorithm is not very efficient because of its sensi- 
mivity tO error propagation. For large n, the error propa- 


gation affected the convergence rate and the method sometimes 
49 





converges as slowly or even more slowly than tne steepest 


descent method. 


The adaptation gain of the ASD method is computed 


ieee bemma 2.05 and the fact that the adaptive filter H 


mmupcated by the iterative equation: 


Heyy = Hy + OK ° Vy 


feom Lemma (2.05) and (2.67), 


i _ 
ie aw 
but Gea = 2CRxy Exar - By) 


= 2(Ryy Cx + %K Vy) - xs) 


= Ge + 20% Ryy Vy 


Using (2.69) and (2.68), we obtain: 


T 
Cee eK 
T 
Oy = - - ° Gy Vy 
—) 
~Va Ry Yq 


[see (2.63) ] 


2 (Ryy Hy Ee Ryo) i 20% RyyV x 


K+1 


(2.67) 


(2.68) 


(2.69) 


(eo 


Cr) 


iiteedece lemared Steepestmaescenteadaptive filter is 


carried out by the following steps: 
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frep i] “Set a starting filter vector, Hy = Hy> stopping 


bound ¢«, the correlation matrix R and the cross 


~XX 
correlation vector Ryo, and the gradient G). 


[Step 2] Compute the gradient G, Orme). 


G = 2CRy, H, - R 


K ~XX —K Rys) 


fen) 5) Compute the step direction vector Vy: 


. Gy for K = 2, 4, 6 
Vy = 
Hy, - Hy, tO FTC Iisa S852 7 
[Step 4] Compute the adaptation gain Oh 
T 
K 2 
Vi Byx Y 
Motep 5] Update the filter vector Hy. 
acpi ee kK 


[Step 6] Test for stopping condition. 


Pt Gy ° Gs e, terminate. Otherwise go to 


Seep 02s 


3. Amir's Method (AMM) 
This method was suggested by this author at the 
Peeinning of the research. The purpose was to derive a method 
which will converge faster than the steepest descent method. 


Experiments showed that the AMM method converges approximately 





three times faster than the SD method as shown in Fig. 2.6a. 
This method is a non-conjugate gradient method and is not as 
mroeeas the conjugate gradient methods. But it can replace 
the SD method as a robust and faster method. 

The AMM gradient search method was designed based 
on the fact that the gradient of a unimodal performance func- 
tion vanishes only once, at the stationary point of the per- 
formance function, which is the extremum point we are looking 
HOT . 

The adaptation procedure is derived in the following. 


The functional ¥y 1s defined as: 


aC = G (Caley 


Pees, 1S the gradient of the performance function J, as 


K 
i 2.635). 


The adaptive filter Aya 1s updated as given in 


(2.56) for the SD method. The adaptation gain a, is computed 


K 


meaeoraing to the "best step'’ concept, to minimize Ye? 


Meang (2.64), we obtain: 


T 
ee Ge C 
K+1 (=—K+l —K+l1l ee) 
T 
= (Gy * Oy Ryy Gy) * Gy + OK Ryy Ge) 
=e (Glee he) (la, Row )G 
oe me ae Ce Or ok 
Y = G6) (r+ 2a R ee one (22713 
K+l ° =K YY K XX Kwok -71-3) 


a2 





In order to get the best step, we take the derivative of 


Yel with respect to the adaptation gain ay and Set it to 
mero: 
d y ase 2 a 
day K+1= Gy (+ 2Ryy + 2apRyy) Gy 0 271-4) 
a i ee. 
er ae KK Axx Sx? 
@euve (2.71-4) and get: 
i 
GR G 
ay = - (2 les) 
ek 


The AMM adaptive filter is implemented by the 


following steps: 


Moecp 1] Set initial filter vector H,, the stopping 


bound e¢, the correlation matrix R and the cross. 
NG 


XX 
correlation vector Ryo: 


[Step 2] Compute the gradient Gy of the performance function 


ne 
a ee ee 
[Step 3] Compute the adaptation gain ay? 
mans Gy ai Sk 
oes 


meen 4) Update the filter vector Hy 


ere ke wecIenek 


a5 





Iemep 5] Test for stopping condition. 


ot Ye SE; then terminate, otherwise go to step 2. 


4. Fletcher-Reeve Conjugate Gradient Method (CGF) 


The Fletcher-Reeves conjugate gradient (CGF) was 
Mieste introduced in 1964 by Fletcher and Reeves [69]. The 


method is similar to the pioneering work of Hestenes and Stiefel 


[54]. The CGF method uses conjugate vectors as step direc- 
kT) « 
Wenrin2tion 


The vectors V;> V Tee mo i ecOmbe § GONJUSdee' with 
mespect to the matrix Ryy mene v Saeloiy tment Ol Low ine 


mond). tion: 


i - aT. 


The importance of this method is its fast convergence rate 
mereaquadratic functions like (Eq. 2.10). This method is 
proved to converge inn steps apart from rounding errors 
where n is the dimension of the filter vector. 

The adaptation gain of the CGF method is computed 
using Lemma 2.05 and the fact that the adaptive filter Aye] 
memupaacted by the iterative equation in (2.67). Following 


the equations (2.67) up to (2.71) in a similar way, we 


obtain the adaptation gain as: 
\ 

a 

K Bi 


oa 
Va Ryx Vx 


el 
cs 2 


K (7-6) 
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The step direction vector V, is computed by the following 


K 
Pterative procedure [55]. 


nolan omar x as >) 
T 
oe ele 
g. = —K+] —K+] (2.74) 
a 
eek 


The method of CGF was once applied to the Rosenbrock 
function [54]. The performance result was poor. Subsequently, 
it was suggested to restart the method every n iterations, 
Mimeeee) 15 the dimension of the vector H. This thesis con- 
firmed that the convergence of this method for our two per- 
formance functions (2.10) and (2.25) is faster if this method 
of restarts is used. 

The CGF adaptive filter is carried out in the follow- 


ing steps: 


[Step 1] Select a starting filter vector Ho» the stopping 
bound ¢«, the auto-correlation matrix Ryy and the cross- 


correlation vector Ryq. 


[Step 2] Compute the gradient Gy Deaenempehroarmance Lunce- 


mon J. 
Sees Come | KG, 
[Step 3] Compute the step direction vector Vy. 

- G, if K Mod n = 0 

th - BEG 
- G + Kk =k wy else 

—K C i C —K-1 : 
—K+1l —K-1 


0 





step 4] Compute the adaptation gain: 


T 
ee a <a 
K a T 
Ve Ryy Vx 


Beep 5) Update the filter vector Aya: 


Zier” ORK 
fotep 6] Test for stopping condition. 


ac 
K 


eiecp 2. 


hoc, GS €, terminate the adaptation. Otherwise go to 


K 


5. Pollack-Rebiere Conjugate Gradient Method (CGP) 
The Pollack-Rebiere conjugate gradient CCGP method 


meet lar to the CGF method. The difference is in the conm- 
putation of the search direction when K Mod n # 0. In [56], 
Powell gave a theoretical reason for favoring the Pollack-Rebiere 
algorithm. In this thesis, the author found the CGP method 

more efficient and Converging faster than the CGF method. (See 


Bection F). 


Miressecdrem Gilrectlon Of the CGP method 1s given by 


meme tOllowing expression: 


I 
ego SK See), 
—K-1 —K-1l 


(2.75) 


feemocr adaptive filter is carried out in the following steps: 


[Step 1] Select a starting filter vector H the stopping 


==()? 
bound ¢, the auto-correlation matrix Ryy and the cross- 


correlation vector Ryy: 
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feeep 2] Compute the gradient Gy of the performance function 


a 


G, = VI = 2(Ryy Hy - Ryg) 


[Step 3] Compute the step direction vector re 


2G 1f K Mod n = QO 


K 
V = 
ZK T ; 
rae, eee Ky, 
ale C _ G =e LSe 
—K-1 -K-1 


[Step 4) Compute the adaptation gain. 
T 
1 Sk 


VK 


sa - 
Ryy Vx 


eK 


[Step 5} Update the filter vector Aya: 


eel) ean OK AK 
foccp 6] Test for stopping condition. 


lag 6 < ¢€, terminate the adaptation. Otherwise go 


—K —K = 
to step Z. 
ie Davidon --letener-Powell Variable Metric Method (DFP) 


One of the most efficient searching methods is the 
Davidon-Fletcher-Powell (DFP). It was developed by Fletcher 
and Powell [ 57 ] from the variable metric method due to 
Davidon [54,58]. The variable metric term was coined by 
Davidon to describe methods which at the K iteration utilize 


the increment of the form 
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AH = A. G (2.76) 


ake Kk —K 


o@amupdate the metric-correction transformation Ay from 


iteration to iteration. The DFP method updates the metric 


Ay iyeeene iterative expression: 


i beck, 
fume jee ek a = Bx Ex Px Ax Ce) 
oxe1 7 2x * ox SSK * =KER=K ok 
Va Py Py Ay Py 
where: 
es ee Sk coe) 
Py = Gxez 7 && (2.79) 


Fletcher and Powell proved that for a general 
function J that a positive definite Ay implies Avy] is also 
positive definite [| 58 ]. For the performance function J 
given in (2.10)}, it can be shown [.59 J] that the set 
{ oy Ay - Gy} ics eheOtucOonauicate Cinections So the DFP 
Meee DitS quadratic termination inn steps. 

The adaptation gain of the DFP adaptive filter based 
on the best step concept introduced in (Lemma 2.05). 


fence che filter update of the DFP method: 


H 


—K+] = H 


V 


The adaptation gain is found to be: 
ii 
ee 


a 
Va Rxx Yx 


re 
os ¢ 2 
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Ricmaravinve “tierter daestonea by the DFP method is 


Carried out in the following steps: 


{Step 1] Select a starting filter vector a the starting 
correction metric ae = I (where I is the identity matrix, 


the gradient Go, the stopping bound ¢, the autocorrelation 


matrix Ryy> and the cresscorrelation vector Ryo: 
{Step 2] Compute the step direction vector Vy: 
Bin ok 2K 


[Step 3] Compute the adaptation gain. 


ic 
woe bok SK 
K eee 
eek 
[Step 4] Update the filter vector Hy. 
= a 

EB coiwe 2K. ake 
[Step 5] Compute the gradient Gray oteeae LUGO Jl. 

Goa ox See xs 


[Step 6] Compute the vector Pye 


ee ha OK 


[Step 7] Update the variable metric A 


Ke 
T qT 
Ee 
ray ee ek Ek Ax 
~K+l1 ~K vi p pia p 
—KkK —K —K TK —-—K 
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[Step 8] Test for stopping criteria. 


TE Gray? Ske 


mor co step 2. 


< ¢€, terminate the adaptation, otherwise 


D. DERIVATION OF SEARCHING TECHNIQUES FOR EXTREMUM, 
GRADIENT SEARCH METHODS FOR THE MAXIMUM OF THE 
MSNR PERFORMANCE FUNCTION 


1. Approximation for Best Step Adaptation Gain 
The maximization of signal to noise ratio performance . 

function J, as defined in (2.25), is a non-linear performance 
memeeton Of the filter vector H. The function J being non- 
quadratic introduces new difficulties. The methods which 
have been theoretically proved to converge in N steps for 
@eearatic cases like the mMSE, no longer converge as fast. 
The adaptation gain can no longer be efficiently computed 
by the best step concept because of the large amount of 
computation required to obtain the best step. In order to 
make this gradient search method efficient, the adaptation 
gain is approximated by the "best step" concept to generate 
a nondecreasing sequence of performance functions {J,} which 


finally converges to the maximum of J. 


Lemma 2.06 


Let the performance function J be defined as in 
(2.25), and the adaptive filter be updated according to 
(2.67), then the best step adaptation gain at iteration step 


K satisfies the relation: 
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T 

Hy Ags 
T 

Vx sg 


Proof 


The proof is based on the Lemma 2.05. 


Lemma 2.05, we obtain: 


T S 
ere: een 
where Gri LS lelive mgre\slaerohe 


temeene K+] iteration step, 


Meearch direction) vector. 


ee 
are) 


K+1 ° Ryn? * Ux 81) 


K+1 Ryn) Vx 


Usanes(2.0/) and 


(Day 


CHEN EDeCELOLmMance ftuncrELOn J 


ene, Wee 1S elake ss\6 yo) @lalrtleie aloe 


k 
Bue acGord rho ue (2520). 


it 
Bees ciel 
gel HR. Og 
—K+1.°NN —K+1 
The gradient of Syed With respect to Hee] ars 
¢. by : 2 te ee ROH 
—K+1 Hye] yt R oY 2S K+1/NN —K+] 
eee beeen NK 
(385 ) 
Meeene (2.83) in (2.82), we obtain: 
2 T T T 7 
oi yee “Seg ~ Tred Axi 2 pea ce 
mee according to (2.67) 
eer eee eK 
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and Reo» Ruy being symmetric and positive definite gives: 


T T . Ar os 
eK sce er NN) 2K oe 


; T T = 
. A CRss > Joa * Ryn Yk * %K 7 Ya sg > Jee RwnYe = 9 
DO, T 
— Hy (Rog - Jxay Ryn) Yq 
i. a a (20) 
Vx gg - Jxey Ryn) Vx 
OFE ave 
The adaptation gain by iim cco cannes be Obtained 
Weeause it 1s a function of Seay which itself cannot be com- 


puted without Oy: [huss Ghee De StasucD, MEOnccpc Imthoduces 

a nonlinear problem for the MSNR performance function. In 
order to overcome this problem of solving a nonlinear equation 
in each iteration, the adaptation gain will be approximated 
by using Jy instead of Sear: Since Jy is obtained one step 
K? Seay does not need to be solved. Now we must 
prove that this choice of adaptation gain for the MSNR per- 


prior to a 


formance function will generate a nondecreasing sequence 


{Jy} which eventually converges to the optimum Je 


Lemma 2.07 
Mecethe performance function J be defined as in (2.25) 
and the filter Hy De wupdatedmbye( 2-07). let the adaptation 
gain Ob be given by 
— Hy Sse 9 re Snee oe 
Vx €Ros - Sx Ryn) Yx 
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Mien it will generate a nondecreasing sequence {Jy which 


converge to the maximum J. 


Proof 
Meine (2.25), we obtain: 


fp eG 


a +] w~ a +] 
Joey = eke ss Been i, 
Heyy Ryn exer 


Smiesticution of (2.67) in (2.88) gives: 


T 
CH, tay Va) Rog (Hy + oy Vy) 


J = 
K+] T 
CH, tay Vid Ryy (Hy toy Vy) 
(2.89) 
T T 
Vx Rsg Bx 2 Vk Rss Vy 
wn. ok a 
= J, fin Rog Hy Hy Rog Hy 
T T 
Vx Ryn 2x 2 “Vx Ryn Yq 
ey a Oy re 
Hy Ryn Ex Hy Ryn Hx 
Equation (2.89) is simplified due to the fact that Roos Run 


are symmetric and positive definite. 


In order to obtain a non-decreasing sequence {Jets we 


must satisfy: Jee] > Jee but the sequence {Jy} 1S POSielve, 
SO: 
J 
ss Sat (2.90) 
K 
In order for (2.80) to satisfy (2.90), it can be seen 
Biat: 
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T T 
Vee ae apna ee a 
1 + 2g. nk SSS _=K , a =k “ss “Ky 1+ 2a, =k ENN Hy 
Hy Rog Hy Hy Rog Hy He Ryn Bx 
T 
eo Ree el 
2 —K ~NN —K (enon) 
Ko ul, y 
He Ryn Ex 


meme (2.91), (2.25) and the fact that Run? Rog are positive 


definite matrices, we obtain: 


T li Ii i 
2° Vx Rog Hy + a, Ve Roo We 2 ats) K (2 Vy Ryn Hy + ay Vy Ryn Vy) 
(2.92) 
met. (Roc - Jy Ry Vy > - 2°Vy CR ceuhi He 
K -—K ‘~SS K ~NN’—-K — —K ‘~SS ~NN 
Vy (Reg - Jx Bun) Hx 
> -2° : (2.93) 
Ve (gs - 4x Ryn) ¥x 
So the adaptation gain given in (2.87) generates a non- 
decreasing sequence {Jy} because it satisfies (2.93). 
Q.E.D. 


Lemma 2.08 

Let the performance function J be defined as in (2.25) 
am@aeche filter vector Hy being updated by (2.67), then for 
Sach iteration step K, the gradient Gy of J is orthogonal 


to the filter vector H, regardless of the adaptation gain. 
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iP efekeng 
ime performance function given by (2.25) is 


T 
iy Reg Hy 
0 2 

Hy Ryy Hx 


[=x 


The gradient of Jy with respect to the filter vector Hy 1s 


given by 


Gx "HS ; o. , ss 7K" Run? * Hx 
Bk ENN =k 
(2.83) 
Peom (2.83), it follows that: 
le 2 Z ~ul ; ; 
ie os . fir (Rog ~ Sx Ryn) * By (4-94) 
Hy Ryn Bx 
7 T 
eit ERE 
alco = 2+ ( DK SSS 2K. y SK ENN =K ) Petes 
SE As fs Ree Lee pee 
Hy Ryn 2k He Ryn 2k 
Meme (2.25) in (2.95), we obtain: 
eae se GI) =) 0 
ees ook 
T _ 
Therefore, Hy Gy = 0 (2.96) 


Thus the filter vector Hy arelechaetonmesteD kK 15 Ortho- 
gonal to the gradient Gy of the performance function J. 
ESD. 
2. Steepest Descent Method Spy 
The steepest descent (SD) method as described for 


the quadratic mMSE perform function can be used here for the 


65 





MSNR performance function with some exceptions: 

The concept of the "best step" is used by an approx- 
imation of the best adaptation gain. The gradient of the 
performance function with respect to the filter vector is a 
function of the performance function. Thus, successive 
values of performance function must be computed. 

The adaptation equation used here is identical to 


e750). 


H eee 


Heyy = Hy F Oy * G 


—K 
The adaptation gain is obtained from (2.87) by replacing 
the step direction vector V, with the gradient G, (the 
merection of the SD). 

The adaptation gain obtained is: 


T 
Hy (Rog - Jy Ryn) S 


a= 
K ak _ (2.97) 
Gk (Rgg - Jy Ryn) Sx 
The matrix Qy is defined as: 
4 : 
Seer oss | 9K oNN (2.98) 
Substituting (2.98) into (2.99), we obtain: 
T 
ae SSK 
K ¥ (2.99) 
SK 


The adaptive filter designed by the SD method is carried out 


in the following steps: 
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maeep || select a starting filter vector Ho» and a stopping 


bound 6. 
[Step 2] Compute the performance function Jy ASeaie2...2 5 ) 


T 
Pec omak 


T 
Hy Ryn Ex 


Jx 


(Step 3] Compute the gradient Gy 4 Vu Jy: 


a 2 
Gy p 4 le a s Qk ° Hy 
=k ine —k 


where Qe 1s given by (2.98). 
[Step 4] Compute the adaptation gain: 


T 
Hy Qk Sx 


CL 
K 
Ge Qk Sx 


[Step 5] Update the filter vector Hy 


Heyy = Hy * ay Sy 


[Step 6] Test for stopping condition. 


Hy II 
Simerwise, go to step 2. 


como, eNen temmumate the adaptation. 


The stopping condition is different from the one used 
for the mMSE criteria because in this case the gradient Gy 


1S a nonlinear function of the filter H, and when H, + @ the 


K K 


meervent G, + 0 (use (2.83) to verify). Thus, the gradient 


K 
does not necessarily vanish at the stationary point, but can 


vanish when the system diverges. 
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feenceetlenacea oceepest Descent Method (ASD) 
The ASD method derived in (I1I.C.2) is applied in 


this section with some modifications to design an adaptive 
filter which maximizes the performance function J in (2.25). 


The adaptive filter is updated according to (2.67). 


Heyy = Hy + Oy * Vy 


The step direction vector Vy iS computed from the filter 

vector H, and the gradient G, of the performance function 

Jy: 
So for K 


Nh 
ho 
Ww 
f 
aD 
Or 


A 
ul 
Gi 

v1 
wn 
v1 
| 


ieemeradient G, is obtained from (2.85) and the adaptation 


K 
eae trom (2.87) and Lemma 2.07. 
The adaptive filter designed by the ASD method is 


carried out in the following steps: 

[Step 1] Set a starting filter vector Las Hj; stopping 
bound 6 and compute the performance function J and the 
gradient G. 


[Step 2] Compute the performance function value at itera- 


Bron step K. 


Li 


T 

Hy Rego By 

K a 
Hy Run Hx 


a By 
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[Step 3] Compute the gradient Gy of Jy with respect to Hye 


_ 2 ; 
Gy, = a Ox * By 


where Q, is given by (2.98) 


[Step 4] Compute the step direction vector ie 


- Gy fom 


il 
ho 
~~ 
> 
ww 
Or 


Hy, - Hy, for K 


il 
1. 
~~ 
uw 
~~ 
—] 


[Step 5] Compute the adaptation gain: 


a 


ae. aK Sk SK 
K T 
Gy Q &x 


[Step 6] Update the filter vector: 


Baa) eee | 2K 


[Step 7] Test for stopping condition: 
wena 2K 


Hy 


otherwise go to step 2. 


ee Oe Cimccim nate tne adaption, 


4. Fletcner-Reeves Conjugate Gradient Method (CGF) 
The Fletcher-Reeves conjugate gradient (CGF) method * 
1s applied to the MSNR adaptive filter in a similar way as 
for the mMSE adaptive filter. However, the nonlinear MSNR 
performance function requires more computation and does not 
Meeethe true “best step" but an approximation. The "restart" 
conceot was used and found to be abie to accelerate the con- 
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The adaptive filter based on the CGF method is 


updated by the following iterative scheme: 
Hy, = Hy t ay Vy (2.100) 


The step direction vector V, is obtained as in (I1.C.4) by 


the expression: 


: Gy if K Mod n = 0 
Bian ae (2.101) 
Sx SR Tk 
GeeG 
Oye 12K-1 


The adaptation gain a, is obtained from Lemma (2.07) and 


K 
given by: 
Ce eee ee ny 
aes NN K 
ay 7 SN (2.102) 
we Ugg = reo Eure! lie 
Meane definition (2.90), we obtain: 
T 
-. 2Q VY 
OK ee alos) 
Vx Sx Yk 


The adaptive filter designed by the CGF method is 


Carried out in the following steps: 


[Step 1] Select a starting filter vector Hy, and a stopping 


bound 6. 


fotep 2] Compute the performance function Jy. 
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|< 


T 
Hk Rg 


Lo 


ny 
He Ryn 2x 
feep) 5s] Compute the gradient G, of Jy Wichebespeet to Hy. 
ee. 
sap o 
Hie Run Bx 
where Qx eS Cee TIM ger 2 9 0) ee 


heen 4| Compute the step direction vector Vee 


- G, 1£ K Mod n = O. 
V+} ae G 
Sie Gia Na 5) 
2k ql C way Beece 
—K-1-K-1 


[Step 5} Compute the adaptation gain. 


T 

io eek 

K ——— 
Vx Qk Y 


meocep 6] Update the filter vector H.. 


Hey = Hy * oy Lx 


[Step 7] Test for stopping condition. 


H sreleL 
if | 7H aaa < §6 then terminate the adaptation. 
—=K 


Memerwise go to step Z. 


eee ollack-Rebiere Conjugate Gradient Method (CGP) 
The CGP method is similar to the CGF method. The 


Only difference is the way the step direction is computed. 
fel 





Ticwea wicoemodauses tne Lollowing expression to 


Bompute the step direction vector Vy- 


cr if K Mod n = 0 
Vx = 7 
Gealecs - Ge.) 
- ee eee] 
eal KTS PP 
—K-1—K-1 


All the rest is identical to the CGF method. However, this 
method was found to converge much faster than the CGF for 
all the images tested in this thesis. 

The adaptive filter designed by the CGP method is 


Carried out in the following steps: 


Seep l| Select a starting filter vector H, and a stopping 


bound 6. 
[Step 2] Compute the performance function Jy. 
T 
ge Ses) 2k 
"KTR a 
—K ~NN —K 


eeep S| Compute the gradient G, of Jy with regard to H 


K K" 


- 2 
Gy = — Rx Hy 
He Run Dx 


where Q, PS MSM DY wCz oS) 


[Step 4] Compute the step direction vector V 


m 
- Gy if K Mod n = 0 
V, = T 
“Kl amet 
a Go econ ee Ny Else, 
°x * Ty em 
SK 


ae 





[Step 5] Compute the adaptation gain. 


I< 


T 
He Qx Vy 
Tay 
Vx &x Yk 


aK 
[Step 6] Update the filter vector Hy. 


en te Ie AK 


ferep 7| Test for stopping condition: 


eked 2K. || 
et TH, a een commie emone ddd de 1 Onl, 
—K 


Bimerwise go to step 2. 


ie elavidon-Fletcher-Powell Variable Metric Method (DFP) 


The DFP method is applied to the MSNR adaptive filter 
in a similar way as for the mMSE adaptive filter. The major 
difference is the approximation of the adaptation gain and 


the need to evaluate the performance function at every iter- 


ation step X. 


The adaptive filter based on the DFP method is updated 


by the following iterative scheme: 


is 2K AOKeO WK 


The step direction vector V, is obtained by the variable 
Metric: 


Vs Ay - G 20'S) 


The adaptation gain is obtained from Lemma (2.07): 
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I< 


T 
He Qe ¥x 
T 


H 
(2.106) 
V 


I< 


Vy Qx Yx 


where Qk 1s given by (2.98). The metric A, is updated by 


SaieeorP iterative procedure: 


T T 
A Seay (ak =n : eee Eke (25107) 
~K+l  ~K OK oy Tp Dee 
—K ~—K —K ~K —K 
The vector Py in (2.107) is defined as: 
Eee enc aC (2.108) 
—K ~ —K+l1 —K ; 


The adaptive filter designed by the DFP method is 


Carried out in the following steps: 


Motep 1] Select a starting filter vector Ho» the starting 
correction metric he - I (where I Domencmtaentity Matrix) , 
compute the gradient G. of the performance function as 
before, set the stopping bound 64. 


meep 2) Compute the step direction vector V 


K: 
Neue 2K 
[Step 3] Compute the adaptation gain he 
aE 
woes ok Sk “kK 
K ak 
HG KK 
[Step 4] Update the filter vector Hy. 
Oot OK aK 
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{Step 5] Compute the performance function Jyar: 


T 
AyeiRss Bee 


K+l1 ill 
Aye Ryn Exe 


wy 
il 


feeep O| Compute the gradient Gey of the performance 


ime tion Jue] with respect to the filter vector Ayyy- 


a 


T 
Ae Ryn Exe 


G 


Gri H 


Qed 7 Bed 


where Qe] Pomoe Veta DvstIpda ting. (2.98). 
[Step 7] Compute the vector Py byez. 1038): 


Px 7 Sxa1 7 &k 


[Step 8] Update the correction matrix Aya] byte 2 1.0.7) 2 


v. ve Oa PL Pl aA 


A ee Cerny 
~K+l  ~K  “K Vy Tp pra p 
~K =K —K ~K =K 


[Step 9] Test for stopping condition. 
This step can be done after step 4 to save some extra 
computations but was placed here to follow the consistent 


pattern as all other methods. 
H 


veal ase ; 
it: Hoe SeciciecciMimate the adaptation pro- 
~K 


seaure. Otherwise go to step 2. 
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7. Amir's Transform Method (AT) 


From test results, which will be shown in (2.17 - 
Yeezo), it was observed that a faster convergence method will 
be helpful for designing the MSNR adaptive filter. Both the 
conjugate gradient Aechod by Pollack and the variable metric 
method following Davidon do not exhibit the same convergence 
speed as for the quadratic mMSE case. The reason for this 
Slower convergence for the MSNR performance function is the 
nonlinear nonquadratic performance function as shown in (2.25). 
He was then decided to derive a method tailored for this 
Memceormance function. The derivation of this method is based 
Seine generalized eigenvalue/eigenvector problem introduced 
By the stationary points of the performance function J in 
(2.25). The stationary point of the performance function J 


in (2.25) satisfies (2.40) which can be written in the form: 


~ Ryy * Snad)* H* = 0 (2.109) 


(Raw Bxx are 


eee Hi 15 the optimal filter vector which maximizes the 
performance function J. 


icmopemake:lter H* satisfies 


I -l 
kFo=> 2 3 e e # 
H 5 a Run Rog H 22 ARO) 


From equation (2.110), it is obvious that an adaptive 


, , we el 
filter designed by using the transform matrix F Run Roo 


for updating the filter will satisfy (2.110) if it converges 


to the optimum. 
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In order to accelerate the convergence of such an 
adaptive filter, a gradient search is added to update the 
filter vector. The steepest descent search direction is 
adopted. The "best step'' concept is used partially to com- 
pute the adaptation gain. 

Meplvuerataon step kth, the filter update equation 


1s described by: 


= ale ° -l ° ° a ° i 
Px+1 ~ J, Syn ss Sk * °K Gx @. qa 
The transform matrix M, is defined as: 
1 = 
Mk ° a, Ryn: ss gaee) 


Meme (2.112) in (2.111), we obtain: 
Hyay = Mao Hy * Gp° G (2.113) 


The adaptation gain for the AT method can be obtained by 


Lemma 2.05. 


Ge Ge = 0 C57) 


Oe e) 


From (2.83) and (2.98), the gradient Gray K+l is: 
G ep i es 
—K+1 il : ~H Cel 4 
Hk+1SNNEK+1 “kel Bx+1 


a 





Ceomicermles) and 4{2.ll4)0in (2.5/7), we obtain: 


T 
2 _ 
LC Tas, — Qeaz Cy Hy + ay Gy] + Gy = 0 
—K+1l .NN —K+l 
(2.115) 
T 
ee ee eee 7 8x 2 (2.116) 


(2.116) can be viewed as a dot product between two orthogonal 


vectors, and the expression can be modified to be: 


it = 
Gee (xe ek Bx * Sx Skea Se = 9 ee 
peaving (2.117) for dy, we Gibtain: 
Ik 
G+ Q M+ H 
ane ot nae ls) 
Gy Q 
—K SK+l Si 


Since Oye] is not known, we use Lemma (2.07) to approximate Oy: 


T 
G; Q, M, H 
=e ee (22119) 
K a G 
Sea 


In order for the adaptation gain a, in (2.119) to be accept- 


K 
able (i.e., the adaptive filter will converge), it must 
Seauasty the condition (2.90). 

The adaptive filter designed by the AT method is 


carried out in the following steps: 


meep |] Select a starting filter vector Ho» and a stopping 


bound 64. 
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focep 2] 


[Step 3] 


[Step 4] 


[Step 5] 


[Step 6] 


be 


|| H 


COuMMGemEnempcLLOrmance Lunction J, as in (2.25). 


K 


T 
He Reg Hy 
a aaa 
He Ryn 2x 


. A 
Conputemtne se nadrene G, = Vis Jy: 


ee OH 
Ree 
Hy Ryy Bx 


Compute the adaptation gain. 


il 
Re aeek 
K 
Gy Qk Sx 
Update the filter vector Hy Deora aie @ (2185). 
oT Cai er eric 


Test for stopping condition. 


eat 


KEL al s 6, then terminate the adaptation, 


Hy | 


otherwise go to step Z. 


E. CONVERGENCE AND CONVERGENCE RATE OF THE GRADIENT METHODS 


ve 


oD Adaptive Filter 


Tneorem 2.09 


Bor anye starting tilten vector H 


0? the sequence 


{H, } of the adaptive filter given by (2.56) converges to 


ig 





Piemunique Optimal solution H* given by (2.12). Further- 


more, the rate of convergence satisfies 


H - H* 
Kel 2 = le 1, 2-C_ (25120) 


A 
= i a 1+C 
|| Hy - HF{] = 8B 


p 


where C is the condition number of the Hessian matrix Ryy 
Sieene performance function J given in (2.10) and g is a 


constant. The condition number is defined as 


a (Cra) 
L 


where hi? do are the largest and smallest eigenvalues of 


the Hessian matrix R 


XX° 
Proof 
The Kantorovich inequality is used to prove the 
theorem. 
The functional tk 1s defined as: 
eer) RH ee) (ee) 
K ~ ‘=K — ~XX°-K = : 
where Ry y is the Hessian matrix of the performance function 


meme. 0). For the adaptive filter Ryy is the correlation 
matrix of the observed image signal. 

Deere tnewtiliter witch minimizes (2.10). The filter 
Meeror H* is called the optimal filter. ¥Y is updated at 


imeeration step K + 1 as: 
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J 
= “ *« i * 
Yay = CAxag 7 HYD Ryy Gay > HD eas) 
Using (2.50) tO Substitute for Ayal eee 5.54, 
we have 
T 
= - * = * 
ee eee ex HD Ry Sy Ge HY) 
Zee) 
Usine the definition (2.122) and the fact that 
Ryy ised Symmetric matrix, we obtain: 
_ lee: a eT 
este er ee By ty Gk Ry Sx 


The adaptation gain a, is given by (2.66), which can 


K 
Demused in (2.125) to obtain: 


= e f =- a 
ome eect 2x By 2) 
T 
a 35 e OQ: ° Sees e cir G 
2 a EG Kk 
Se Kk 
(e126) 
_ a ee ou ~al 
= By + fay Gy Ryy ly - H") - 5 ty * Gy G, 
Deane (2.05) and (2.12)7an (2.126), we obtain: 
oe —_ * _— — a * 
See ee eee Bye * Bysg ~ Bxx 1) 
tf 
0 
= G, ie) 
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Selestitution of (2.127) in (2.126) gives: 


S= 
i 


Z a T 
Ko Me tty GG Foy Ge Gy 


2 if 

a ie Z 7 ay Gy Gy (eZ le2 3.) 
Let us define the vector Ex as 

E,° H. - H* (2.129) 

ee oni ue : 
Meeemee this definition (2.129) in (2.122), we obtain: 

eere RoE aso 

K —K ~XX ~K 
From (2.127), we obtain: 
1] -l 
Lea = G (22S 1) 


K 2 ~XX —K 


Using (2.131) and the fact that R is Symmetric, (2.150) 


XX 


became: 


lo Ar 
enn dye oc ae 


meee (2.132) in (2.128), we obtain: 


1 T 
Wee aa Sie Se ee Bae 
y Ld Selene c (2.135) 
HM a Se 2K 
Substitution of Oye CVecimDyNC CT OOM (2.155) gives: 
T T 
ee ee _ &k Sx Ee 
Ct, TCC? +138) 
Gk Ryx Sx Sk Byx Sx 





Now the Kantorovich inequality is used: 


T Tea 2 
(Ge Ryy Ge) (Sx Ryy GY Ag * AZ) 
ee ec ee LF 


~! 


SS < (Qs 5) 
C Sx Sx) ccm 


where dg and Ay are the smallest and largest eigenvalues of 
the matrix Ryy: 


icmmom(2boo) 1n@(2.134), we obtain: 


Y =a Geek 
oe ee < - +, (2.136) 
Ye (Ag + Az) 


2 
eee tl aa) 





*K+l | me os 


Aeain, the condition number of the matrix R is defined as 





XX 
d 
cA — (2.138) 
rf 
(2.137) became 
: 2 
K+1 : 
7 < (Fe )< 1 (2.139) 


Since Ryy is a positive definite matrix, the sequence {¥y] 
mema positive sequence. 


Let us define 





2 1-C 

qe eae C0 
we obtain 2 

Guess eo (2.141) 
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From (2.141), we can see that when k + », the 
sequence t¥e5 converge to zero. The reason is that we have 
a decreasing positive sequence {Yes thus ¥, = 0. It implies 
See (use (2.150) to justify this statement). Since Ex 


Meee ined as H -H* in (2.129), we conclude that: 


H,-H* = 0 or 4H, = Ht (2.142) 


This completed the proof of convergence. 

From (2.139), we observe that the rate of convergence 
of the sequence {¥ 3 is given by (2.140). However, Ye as 
Meeemned in (2.122) is a quadratic function of the vector 
iB 


meee i, it satisfies the relation 





=k K 
2 2 
Y jak - H* : 
a = 8 wee ( =) (C7) 
K || H, - H¥* || 


where 8 1S a positive number. 


iauws we obtain: 


H,., - H* 
a See ir dle. C 
i ee — 


i (OED) 





Theorem (2.09) proves that the SD method exhibits 
ferteast linear convergence. 
Werinition 


Mimelcorithnm with the property that. 


Ay = He 
= teat = constant 1s said to exhibit linear con- 
—K 


vergence. 
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The linear convergence is sometimes called geometric 
convergence since it follows from the definition that for 
Bamoe K, j 

K-4 
I] Hy > H* |p =ao? |], - HF || 


The speed of convergence of the SD method is a function of 
the condition number C. The more ill-conditioned Ryy> the 
Slower will be the rate of convergence. 

Theorem (2.09) used the mMSE quadratic function. 
For the MSNR performance function, it was shown (II.E.1) that 
the sequence {Hy} generated by the SD method converges. Test 
results showed that the convergence of SD is slower for MSNR 
than mMSE. 


fee ool Adaptive Filter 
The algorithm is illustrated in Fig. 2.01. 





Fig. 2:01 The ASD algorithm. 
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The steepest descent steps are labeled SD, the accelerated 
meeps are labeled ASD. 

aha Buehler and Kempthore [53] showed that for an 
n dimensional quadratic function, the sequence of iterates 
Hy » H,, Hy «2. 1S identical to the full sequence of iterates 
generated by the conjugate-gradient descent. Since the con- 
jugate gradient descent takes no more than n steps to neon 
the minimum of the n dimensional quadratic function, the 
accelerated steepest descent takes no more than (2n-1) steps. 

Applying the ASD method to design a multidimensional 
adaptive filter using real test screen images has shown poor 
convergence speed for both the mMSE and MSNR performance 
mmetions. The reason is due to error propagation. These 
methods are sensitive to error propagation, which do not 
satisfy the condition for accelerated convergence. 

eee GG Adaptive Filter 

The conjugate gradient methods CGP and CGF exhibit 
quadratic termination (apart from rounding errors) for the 
mMSE performance function. Quadratic termination means that 
for a quadratic performance function it is guaranteed that 
the minimum will be located exactly (apart from rounding 
errors) in no more than n steps. However, for nonquadratic 
functions like (2.25) the conjugate gradient method does not 
exhibit quadratic termination. For the infinite dimensional 


case, Daniel [60] showed that the rate of convergence is: 
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Wi Bg HL = 
Be > HM] VAL > YA 


where Apt» Ag are the largest and smallest eigenvalues of the 
Hessian matrix of the performance function J. 

Depending on the approach to design, the adaptive 
feet fOr nonlinear, nonquadratic performance function, 
different rates of convergence can be obtained. Some 
approaches exhibit quadratic convergence (those which approx- 
imate the performance function by a Taylor series expansion). 


Others exhibit superlinear convergence. 


Theorem 2.10 

betethe performance function J be defined by (2.10) 
and the adaptive filter be designed using the conjugate grad- 
lent method, then the sequence of adaptive filters {Hy} con- 


Memees in no more than n steps to the unique minimum H* of 


the performance function J. 


Proof 

The proof is based on the fact that both methods, 
CGF and CGP, are based on the conjugate direction search 
method which implies that the step direction vector Vy 1s 
orthogonal to the gradient of the performance function J at 


meemacion step K+1. This fact is stated as: [ 5S, 61 ] 


T Z 
Gone (2.146) 
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The adaptation equation is: 


H = H, + hy V 


Aad Hy Ve 47) 


Its expression at the iteration step n can be related to all 


steps from iteration step K by: 
H =H ye heehee Vis (2.148) 


for any (ee nh. 


The gradient of the performance function J at itera- 


meame step n is given by: 


Ga = 2Byx Ha - xs) (2.149) 
Pyeesubstituting (2.148) in:(2.149), we get: 

G = G.,,+ a Oe (2.150) 
Beane equation (2.146) in (2.150), we obtain: 

a ye OPS 

—K =n j=K+1* ~XX =] 


The method of conjugate gradient is based on generating a 
conjugate sequence of step direction vector {Vy}. 


The conjugacy condition satisfies: 


T 
Vy Ryy Vj = 0 for K # j. ets?) 


Memuse (2.152) in (2.151) to show that: 


iF 


Vx G. 


=n 29155) 
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ihewstep darection vectors Vy? Vaoce: V form a 


—n-1 


mmmete COnjugate basis. Therefore, at iteration step n, 


the only G_ which satisfies (2.153) isG, = 0. 
= (2.154) 


But for the quadratic performance function, the gradient 
vanishes only at the minimum. So we proved that the method 


converges to the minimum of J in (2.10) in no more than n 


steps. 
silostituting (2.154) in (2.149), we obtain 
Byx {ne Bxse> ° ese) 
H = RI R (2.156) 
—n ~XX —XS : 
SO 


eee (2857) 


inwsethe filter converges to the unique minimum of J. 


Q.E.D. 


In practical applications, it was found that the 
conjugate gradient methods converge sometimes in more than 
n steps. The reason is the round-off error. The two condi- 
tions (2.146) and (2.152) are not satisfied, so the sequence 
{V3 of step directions does not form a complete basis inn 
iteration steps. 

For the MSNR cases, the adaptive filter could not 
converge as fast as in the mMSE cases because the performance 


function J in the MSNR is nonquadratic. 
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4, The DFP Method 
iiguvaEtanlewnekrremorr method exhibits quadratic 
termination, apart from rounding errors, for the mMSE per- 
formance function, 
Fletcher and Powell [58] proved for a general per- 
fOrmance function J that a positive definite variable metric 
A 


~ K 
showed that for a quadratic function like the mMSE type, 


mmeres a positive definite Ava updated by (2.77). They 


successive filter updates AH), AH, ... AH form a set of 


—n-1l 
. s + = “1 
conjugate directions, and a= Ryy 


Seitoits quadratic termination inn steps. 


SOME Ne Di) sa leon & hin 


The MSNR performance function is nonquadratic and 
nonlinear, so the DFP method cannot exhibit quadratic ter- 
meemacion. But according to our test results, it is still a 
fast convergence technique. If the method converges slowly, 
it 1s recommended to restart the variable metric every ntl 


me 


steps by setting Aya] = I, to overcome round-off errors. 
meelne AT Adaptive Filter 
The Amir transform adaptive filter exhibits very 
fast convergence speed. The reason lies in the way it was 
designed. Each iterative step uses a transform to satisfy 


the generalized eigenvalue and eigenvector steady state equation. 
maeorem 2.11 


Let the adaptive filter be updated by (2.111) and the 


performance function defined by (2.25). Then the filter Hy 
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Benverges to the unique optimal filter H*, if the adaptation 
gain Oh Satisfies condition (2.90). 


The adaptive filter Hy is updated by (2.111) 


ee ee eee | Fa G 











Pkt ~ JZ ° ann * Sss *° =x x &x oe) 
EM@pstituting (2.83) for Gy in (2.111), and defining 
eel 
es LK ENN 2K eae) 
we obtain: 
(Aes 
a a kon 
Hk+1 * Ty By Bss Bk * T° 8ss~ 4x Eww) Bx 
(2259) 
Rearranging (2.159) gives: 
Zoe 
ee al KK aan ; 
Hye; * (3- Rin Bss * 7 Ruy (> ByN Rss 7] + Hy 
K K K 
(2.160) 
Subtracting H, from both sides, we obtain: 
Ores 
7 i‘ KK al 
Her 7 Be 7 te Ryn) (J Te cs Rceion 
fers the identity matrix. 
Let us define the matrix Ze as: 
(ee 
he KK a (Zo 7) 
= T ee eee 
ZK ans Run 
Since Ran? Roc BIS JOOS aI ers abies; eyeyel re Jy. By 
are positive numbers, thus Zy 1S positive definite. Since 


Oe» Jus By are bounded, 


the norm of the matrix Z., 1s bounded. 


~K 


oil 





In other words, there exists a positive number Such that 


I] 2, |] <A (2.163) 


Taking the norm of (2.161) and using the inequality, 


emma ees [ee|| >| 





B || (2.164) 


where A, B are matrices, and combining with (2.163), 


we obtain: 














1 aan 
I] Heyy Hy || < A I * Ryn Rss eles Hy || 
(2720655 } 
which turns out to be 
| Hey 7 Hx | i =I 
Hi Se eye Ne coe. || (40 Hios) 
—K K 
If the convergence error Ex is defined as 
H Serie 
A I Hay —K 
Ex = Ik fear (Zl G7} 
—K 
ee ee ee ere | Pec) 
= J. ~NN BX ~ : 
The largest eigenvalue of a Ree R = ir 
J, ~NN ~ss ~ + 
ny 
= = i p2pees) 
K 
: : -] 
where a is the largest eigenvalue of Ran Roc: 
But 
RR tf} = max 2170 
Jy ~NN ~SS + ie (2.170) 


o7 





So 


(Zeal? 1) 


The adaptation gain a, is designed to satisfy 


K 
@emaition (2.90) which states that: 


a Caw 


K+l1 K 


Hecatine (2.171) and using condition (2.90), we have: 


J 

max 
i. E =< ’-* ( —-—- 1) 

K+] Je] 

J 
Il. 2 (= - 1) 

K 2/2) 
ery . Jy s Jue] 

fhus, the sequence te, } converges Pome r On.  Decausie 

a is the maximum value of the unimodal performance 


function, and the sequence {Jit is an ascending sequence 


bounded by the upper bound J 





ax’ 
SO 
Sane oe 
lim Ey = Xsan 5 - 1l) =r = 5 1) 
00 Rae K max 
= Q C2 2175) 


Tears sOGOves ut iateCnceti iter converses. At the 


convergence point, (2.170) satisfies 


lL p-l 
Ilz- Baw Beg - IM = 0 (2.174) 


a5 





or, in the vector form 


1 roi 


= Ryn Roo = 0 (e217 5) 


oe) 


ale 


(2.175) is the equation for the stationary points of the 


Meetarmance function J. Thus, J, = Ja and correspondingly 


ax 
H = H®, 


——OO 


So the adaptive filter converges to the unique optimum. 


(Q.E.D.) 


[eee RESENTATION OF RESULTS 
fee Organization of Results 
The performances of both mMSE and MSNR nonrecursive 
adaptive spatial filters have been extensively evaluated on 
two real world infrared images, shown in Fig. 2.1 and 2.2. 
Before the detailed presentation of these results, a detailed 


description is given of how the evaluations are organized. 


moe rilter type: 
- Nonrecursive adaptive spatial filter 


- Search box (filter size) 3 by 3 pixels with 
the estimation pixel in the middle of the filter 


(b) Optimization criterion and performance function: 
- mMSE: Minimization of mean square error 
- MSNR: Maximization of signal to noise ratio 
(c) Adaptation equation: 


1. LMS approach: 
ey yg Gules 
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foeeeeez.l A 9 level 
computer print of 
iatana infrared 
test image. 


eo. 2.2 A 9 level 
computer print of 
China Lake infrared 
test image. 
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(d) 


(e) 


Gradient approaches: 


G 


H Kk * %K &x 


—K+] = Ht 


Conjugate gradient approaches: 


Heed * Be * OK Ye 


Variable metric approach: 


=O ea 


H eee eK 


—K+] 
Amir's transform approach: 


=M, + c 


H Mk t+ OR && 


—K+1 


Search methods: 


LMS approach: 
Steepest descent method 
Gradient approaches: 


Steepest descent method 
Accelerated steepest descent method 


Amir's method (apply only to mMSE case) 
Conjugate gradient approaches: 


Fletcher-Reeves method 
Pollack method 


Variable metric approaches: 
Davidon-Fletcher-Powell method 
Amir's transform approach: 


Apply only to MSNR case 


Test images used: 


Indiana image (Fig. 2.1): 


Oe oe Dike ls 
Blue spike infrared spectral band 
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(f) 


Eee 


An image taken from a city in Indiana 
MamiscdmduieeuextensiVely as a standard 
test image for high altitude downward 
looking infrared surveillance system. 


China Lake image (Fig. 2.2): 


G2eoc pixels 
Rnerniale interpared band ian 10-15u range 


An image taken from a desert area in 
China Lake, California with a highway 

in the picture. It has been used as one 
of the standard test images for short 
Triminecmonacel OOkINCmimarancd Target 
acquisition system. 


Performance evaluation: 


The performance of the adaptive filters is presented 
in four different ways, all as a function of the 
number of iterative steps N. 


is 


Filter coefficients as a function of N. 
(I eeectrleienas 20r a 5 x 5 Spatial filter) 


Output Varianeée as a function of N. 
Processing gain as a function of N. The 
processing gain is defined as follows: 
2 2 
ae) 


where m-., m, = means of the input and 
1 0 
Di eoncdmnaTeserespeetively ; 


a oo = variances of the input and 


Phone cmimamescm respec five ly. 


Output signal to noise ratio (used only in 
MSNR cases): 


Output SNR of the filtered image is defined 
as follows: 


ori 





ei 


H? Ro. H 
SNR, = =e 
ANNE 


Wiehe gues cine) tll ectm—cc COT 


Rog = target Signal correlation matrix 
Run = clutter noise correlation matrix 


Results Of MMSE wAdapeive Spatial Filters 
I - Indiana Image 


The test results of adaptive filters based on the 


mMSE criterion and using Indiana test image are presented 


in the following figures: 


bie. 
rio 


ree 


ELe 


BAG . 


Rie. 


Prem 


Ze 


Jigs 


5 


4 


LMS approach, steepest descent method 
Gradient approach, steepest descent method 


Gradient approach, accelerated steepest 
descent method 


Gradient approach, Amir's method 


Conjugate gradient approach, Fletcher-Reeves 
method 


Conjugate gradient approach, Pollack 
method 


Variable metric approach, Davidon-Fletcher- 
Powell method. 


PiecdciertLOULe three Tesults - the nine filter 


coefficients, output variance and processing gain - are 


presented as a function of iteration steps. 
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LMS Algorithm 


1 aT) - | ave eS SR Sere em ee eee 


Filter Vector 
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Pollack Method - mMSE 
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The following additional numerical results are 
Beesented in Table II-l1: 
- Processing gain 
- Mean of the filtered image 
- Variance of the filtered image 


- Number of iteration steps to go below the prescribed 
error 


- Actual adaptation error when the adaptation is 
stopped. 


a. Discussion 
These results will be discussed in several groups. 
(1) LMS Approach and Steepest Descent Method. 
This approach is the two dimensional extension of the most 
widely used adaptive filter technique. In Fig. 2.3, we can 
see that as the adaptation took close to one thousand steps 
to reach the minimum of the output variance and the maximum 
of the processing gain. However, the adaptation never 
achieved a steady state, even up to 10,000 steps of iteration. 
Further, there is a steady state deviation 
from the optimum output variance. It is known as the 'mis- 
adjustment'' which commonly occurs in the traditional adaptive 
Filter approach Csi 
We believe these problems are the consequences 
of the basic assumptions of this LMS algorithm. The reasons 
probably are not obvious if we follow the traditional adaptive 
Bemeept which was initiated by Prof. Widrow using the error 


Signal concept in control, ¢e€ = Hed: We callie seal eiblejene 
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Hx Mmescompared with a desirable result, d. Their difference, 
€, is used together with a constant, but adjustable, adaptive 
mameeci, to form a correction term, AH, for the filter 
coefficients to approach the optimization goal, which is the 
minimization of mean square error. 

On the other hand, if we consider the 
adaptation procedure as an optimization process, then, the 


adaptation equation takes the form of 


Heyy = Hy + SHy = Hy + ay Gy 


where Gy ts called the “gradient," Oy 1s called the "step 
size.'' The concept of gradient means the gradient of the 
Secrormance function surface, J. The product of adaptation 
Step size a, and the gradient G, is the correction term AH.’ 
It is postulated that the assumptions made 
by the LMS approach are not directly responsive to the goal 
of adaptation because the error term Hx - d is not directly 
related to the minimization of the performance function. 
Further, the assumption that the adaptive gain 2u, which 
corresponds to the concept of step size in optimization, is 
constant, does not coincide with the fact that the iterative 
steps toward optimization usually take place in varying step 
Sizes. These problems contributed to the slow convergence, 


and the steady state misadjustment in the LSM adaptive 


Seatial filters. 
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We developed several adaptive filters using 
meaarent methods developed in the optin.ization field. Their 


results are discussed in the following. 


(7 cradientenporoacmes. Three different methods 
were developed. Their results are shown in Figures 2.4, 2.5, 
and 2.6 for the steepest descent (SD), accelerated steepest 
descent (ASD) and Amir's (AMM) methods respectively. 
The reasoning described above is quite 
convincingly supported by the following observations: 

a. The convergence of adaptation is faster. It took 541, 
443, and 220 steps for the SD, ASD and AMM methods 
to reach the stopping condition of adaptation less 
than 1.5 x 10-11 as shown in Table II-1. 

b. The adaptation procedure indeed reached steady state 
once the adaptation error is less than the stopping 
condition. 

c. The steady state error is smaller than that of the LMS 


algorithm as shown in Table II-1l. In fact, the output 
Variance is equal to that of the optimum filter. 


(3) Conjugate Gradient Approaches. Two differ- 
ent methods were developed. Their results are shown in 
Figures 2.7 and 2.8 for the Fletcher-Reeves (CGF) and the 
Pollack (CGP) methods respectively. 

Again, the improvements are clearly seen. 
Maetact, they are even better than the gradient methods. The 
convergence took only 66 and 10 steps for CGF and CGP methods 


=) 


Beereach below the stopping condition of 1.5 x 10 At the 


Seme time, the output variance is the same. 


(oo tobe etme eapprodch. Results of this 


approach, which is extended from the one dimensional work of 


EOS 





Davidon-Fletcher-Powell are shown in Fig. 2.9. Again, the 
improvements are clearly seen. The background suppression 
result is the same measured by the output variance and 
processing gain. But the convergence speed is even better 
mmamtook only 9 iteration steps to reach below the stopping 


condition. 


3. Results of mMSE Adaptive Spatial Filter II - 
China Lake Images 
The test results of adaptive filters based on the mMSE 
Criterion and using the China Lake test image are presented 
in the following figures: 
ioeeez. 10 - LMS approach, steepest descent method 
Fig. 2.11 - Gradient approach, steepest descent method 


fee 2.12 - Gradient approach, accelerated steepest 
ie See Mi eeilore mec 


Fig. 2.13 - Gradient approach, Amir's method 


me. 2.14 - Conjugate gradient approach, Fletcher- 
Reeves method 


Fig. 2.15 - Conjugate gradient approach, Pollack method 


moe 2.16 - Variable metric approach, Davidon-Fletcher- 
Powell method 


In each figure, three results are presented as 
functions of iteration steps: filter coefficients, output 
variance and processing gain. 

Further, additional results are summarized and pre- 
Semted in Table I[I-2:; 

- Processing gain 


- Mean of the filtered image 
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- Variance of the filtered image 


- Number of iteration steps to go below the 
prescribed stopping error 


- Actual adaptation error when the adaptation 
is stopped. 


a. Discussion 

The results, using the China Lake image, are 
generally similar to that using the Indiana image. Only 
the important features will be summarized below. 

(1) LMS Approaches. The adaptation based on 
the LMS approach again show three problems: slow conver- 
gence, never reached steady state, and misadjustment. 

(2) New ApPpLOcenes weve loped anethis Tnesis. 
All new approaches achieve the same steady state performance 
equal to that of the optimum filter as shown in Table II.2: 


5 


Mean of the filtered image.......... 6.495 -10 
Z 


Ul 


Meniance of the filtered image...... eee 10” 


However, they converge to the steady state value with much 
less numbers of steps, as shown in Table II.2 also. 

Therefore, test results on the China Lake 
image again demonstrated the improvements in adaptive fil- 
ters using the approaches suggested in this thesis. 

It is interesting to note that the effec- 
tiveness of background clutter suppression in the case of 
the China Lake image are not as good as that in the case of 


the Indiana image. For example, the processing gain for 


deler 





@aeechina Lake image is (19.32) db compared with (29.874) db 
for the Indiana image. We believe this difference is related 
to the spatial correlation of the image. The higher the 
correlation, the better is the background clutter suppression. 
The Indiana image is more spatially correlated than the China 


Lake image. 
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A. Results of MSNR Adaptive Spatial Filter [ - 
Indiana Image 


The test results of MSNR adaptive spatial filters 
using Indiana test image are presented in the following 
figures. 

Fig. 2.17 - Gradient approach, steepest descent method 


Fig. 2.18 - Gradient approach, accelerated steepest 
descent approach 


Fig. 2.19 - Conjugate gradient approach, Fletcher-Reeves 
method 


Fig. 2.20 - Conjugate gradient approach, Pollack method 


Fig. 2.21 - Variable metric approach, Davidon-Fletcher- 
Powell method 


Fig. 2.22 - Amir's transform approach. 

In each figure, four results are presented as func- 
tions of iteration steps: filter coefficients, output var- 
iance, processing gain and output signal to noise ratio. 

Further, additional numerical results are summarized 
and presented in Table II-3. 


eueput signal to noise ratio 
Processing gain 

Mean of filtered image 
Variance of filtered image 


Number of iteration steps to reach below 
—ne prescribed stopping error 


Betual adaptation error. 
Discussion: 
a. In the mMSE adaptive filter study, we first 


presented the results of adaptive filter design by the LMS 


Zee 





algorithm because it is the most frequently used method. We 
extended it to two dimensions and used it as a benchmark for 
Somparison. for the MSNR criterion, we have not yet found 
any past study of adaptive filter using this method. There- 
fore, comparisons of convergence results are based on several 
methods developed in this thesis study. 

b. However, we can compare the background clutter 
Suppression results - of the mMSE and MSNR adaptive filters. 
For point targets, their steady state filter coefficients are 
the same if the coefficient of the estimation pixel are all 
Memmalized to unity. Therefore, the statistical properties 
Memene tiltered image are the same, 1.e., the error variance 
and the mean of the image after processing by the two types 
Seetaiters are identical. For the Indiana image, the mean 


and variance of the unfiltered and filtered images are. 


Before filtering REECE fblcering 
mean ees Wiig 0.00006495 


variance 0.74111 De OaeZ 


c. The convergence speeds are different, as shown 
meeraple [1].5. For a stopping condition of nope Gilen mem = 


bers of iteration steps to reach below this condition are: 


SD = 739 CGF = 76 DEP = 25 
ASD = 739 CGP = 76 AT = 2 





Fig. 2.17a Steepest Descent Method - MSNR 
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Fig. 2.17c Steepest Descent Method - MSNR 
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Fig. 2.18a Accelerated Steepest Descent - MSNR 








1.0 
; Filter Vector 
.6 
4 
2 EE SS ca 
0.8 te 
ann N 
-.40 
-.& 
= 
-1.9 
d = Pu Gs > i un oO “J oD 
co o fe nw ¢) oO ray ws fr.) ho 
® oS S @ G C2 an] an] S S & 
ITERATION # 
Fig. 2.18b Accelerated Steepest Descent - MSNR 
. 85 —_———— 
Error Variance 
. 64 
023 
. B2 
ot 
g 6 12 13 24 313 36 42 46 54 50 


ITERATION # 


Zo 








-- Fig. 2.18c Accelerated Steepest Descent - MSNR 
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Fig. 2.19a Fletcher-Reeves Method - MSNR 
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Fig. 2.19c Fletcher-Reeves Method - MSNR 
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Fig. 2.20a Pollack Method - MSNR 
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Fig. 2.2la Davidon-Fletcher-Powell Method - MSNR 
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Fig. 2.2lc Davidon-Fletcher-Powell Method - MSNR 
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Fig. 2.22a Amir's Transform Method - MSNR 
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Fig. 2.22c Amir's Transform Method - MSNR 
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The same trend in mMSE cases is found for the 
MSNR cases. The variable metric method (DFP) is faster than 
the conjugate gradient methods (CGF, CGP) which are faster 
than the gradient methods (SD, ASD). 

It is important to point out that the transform 
method (AT) which does not have a corresponding method in 
the mMSE cases has the fastest convergence speed. It took 
only two steps compared with the twenty-five steps required 
for the variable metric method to reach below the stopping 


@onaition. 


5. Results of MSNR Adaptive Spatial Filters 
Ij - China Lake Image 
Test results of MSNR adaptive spatial filters using 


the China Lake test image are presented in the following 


mroures : 


free 2.25 - Gradient approach, steepest descent 
method 


Fig. 2.24 - Gradient approach, accelerated steepest 
descent method 


eee 2.25 - Conjugate gradient approach, Fletcher- 
Reeves metnod 


eee. 2.26 - Conjugate gradient approach, Pollack 
method 


mo, 2,27 - Variable metric approach, Davidon- 
Pile tater .owe ll method 


Fig. 2.28 - Amir's transform method. 


several numerical results are presented in Table I[I.4: 


Batput Signal to noise ratio 
Erocessing gain 


ES © 





Mean of filtered image 
Mariance of filtered image 


Number of iteration steps to reach below 
jae prescribed stopping error 


Actual adaptation error. 
Discussion 

a. Gradient approaches have not been included in 
these presentations because their convergence speeds are not 
as fast as the conjugate gradient, variable metric and Amir's 
transform methods. 

b. Again, the Amir transform method has the fastest 
Benvergence speed. It only took three steps to reach below 
the stopping condition compared with fifteen steps required 
by the next fastest method, the variable metric method. 

c. Based on the experience using the Indiana image 
and the China Lake image, the comparative behaviors among 


these methods are Similar. 
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Fig. 2.25a Fletcher-Reeves Method - MSNR 
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Fig. 2.26a Pollack Method - MSNR 
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Fig. 2.27a Davidon-Fletcher-Powell Method - MSNR 
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Fig. 2.27c Davidon-Fletcher-Powell Method - MSNR 
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Fig. 2.28a Amir's Transform Method - MSNR 
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Pewee! bebe MICROCOMPUTER SYSTEM 


A. INTRODUCTION 
ieee General 

Signal processing algorithms are usually developed 
on main frame computers. The transfer of these algorithms 
memon-board processors in practical systems is, in general, 
not an easy task because there are many constraints in real 
systems such as the processing speed, weight, volume, power, 
fault tolerance and others. This thesis undertook both the 
theoretical development task and the practical implementation 
Mevestifgation. Specifically, this chapter will present the 
second part of this thesis which considers the implementation 
of adaptive image processing algorithms developed in the last 
chapter by a multiple microcomputer system using concurrent 
parallel and pipeline processing. 

hms important GO pOolnt out that the digital computer 
is not the only technique for real time implementation. De- 
pending on the amount and rate of signal data; precision and 
dynamic range requirements; need of programmability and sev- 
eral oth er factors, different approaches of signal formats, 
device technologies, signal/data processor architectures 
Should be considered. In many cases, combinations of analog, 


Sampled analog and digital processing approaches using optical, 
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electronic and acoustical devices probably will offer cost- 
effective and optimum performance [178-180]. However, with 
the rapid advances of VLSI/VHSIC technologies in both in- 
creasing speed and decreasing power, size and cost, the 
maportance of digital electronic implementation in the form 
of distributed processing using multiple processors has been 
increasing at a rapid rate and will undoubtedly play’a more 
and more important role in real on-board implementation. 
This part of the thesis is to investigate and develop the 
feasibility and potential of using multiple microcomputer 
systems for real implementation. 
wee Multiple Processor Developments 

Multiple microcomputer systems are a subset of larger 
families of multiple processor systems whose developments were 
Started over twenty years ago. It was obvious for a long time 
that several processors are better than one. However, how 
Should they be connected together and effectively used has 
not been obvious at all. The answer depends on many factors. 
menst, what 1s the objective? Is it real time processing, 
fault tolerance, multiple users, security, or some combina- 
tions of these? Second, what are the available technologies 
in both hardware and software? Third, what are the con- 
straints in cost, weight, volume, development time, available 
manpower? The answers have been very different depending on 
many of these factors. We can identify several major areas 


of multiple processor developments since the early 1960's. 
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ape oupercompmeers [151], 152] 

The £irst area can be generally called the "super- 
Memomtcers.'' Several processors were connected in different 
ways to offer parallel processing [153-155], pipeline process- 
ing [156-158] or combined parallel/pipeline processing capa- 
bilities. In some cases, specially designed signal processors, 
called array processors, are connected to a host computer to 
Otter very fast data crunching capabilities. In most of these 
cases, the basic processing elements to form the multiple 
processor systems are special arithmatic or signal processing 
units, not stand-alone computers. Their inter-communications 
and the signal flow are usually fixed in the design stage to 
achieve very fast computing speed and are not changed during 
operation. Several representative systems are listed in 
Table III-1. Their common objective is ''fast computation" 
and “high throughput.' The processing elements are tightly 
moupled. 

b. Computer Networks [161, 162] 

whe second. area can be generally called the 
"computer network." Several processors are connected to- 
gether for intercommunication and resource sharing. The 
Dasic processing elements are mainly stand-alone computers. 
A problem is usually not partitioned and performed concur- 
memcly on several processing elements. The system is, in 


general, loosely coupled. The communication is carried out 


by messages with appropriate synchronization codes at the 


beginning and the ending of the message. 
SS 





Seevitrageeliablewrault Tolerant or Highly 

Available, Graceful Degrading Computers 

[166, 167] 

The third area can be generally called "Fault 
Tolerant or Highly Available'' computers. Multiple process- 
mreeclements have been connected in different ways to offer 
either fail-soft, fail-safe or graceful degradation capabil- 
ities. In most fault tolerant computers, the redundancy 
and/or sparing are usually made at the building block levels, 
such as the CPU, RAM, I/O ports, etc. to make a very reliable 
and fault tolerant single computer [168]. The intercommuni- 
Gaerons among the elements are generally fixed. In recent 
memes, because of the steady decrease of the cost of a con- 
peeer, the basic processing elements in a multiple processor 
system are a small number of stand-alone computers [169-171]. 
These systems started a new direction in the multiple processor 
developments because the intercommunications among the process- 
miemelements are no longer fixed. The processing tasks can 
be flexibly assigned to different processors. This dynamic 
assignment, or allocation capability, also allows a new system/ 
software approach to fault tolerance and fault repair. 

3. Multiple Microcomputer System Developments 
The rapid advance of low cost and small microcomputers 

has extended the development described above into a new dimen- 
Sion because a large number of microcomputers, instead of 
just a few, can conceivably be interconnected into a system. 


Not only can its fault tolerance capability be further 


154 





increased, the computational or signal processing capability 
can also be much enhanced by providing concurrent parallel 
and pipeline processing capabilities. 

The beginning of the multiple minicomputer system 
development was started at the Carnegie Mellon University 
in their Cmmp system [172]. Although it used PDP-11 mini- 
computers, its tightly coupled architecture and dynamic memory 
allocation concept allowed a relatively large number of pro- 
cessing elements to join together into a single system. This 
development was soon followed by a tightly coupled multiple 
mperocomputer project, CM* [173], also at Carnegie Mellon 
Meemmersity. Since that time, several tightly coupled systems 
have been proposed [174 to 183]. Some of them have gone be- 
yond the conceptualization stage and started serious hardware/ 
software development efforts. However, none has reached the 
operational stage at this writing. 

At the same time, another direction of multiple micro- 
computer development has been pursued toward the "'computer 
network" objective [184-188]. These systems can be distin- 
guished from the developments described above in the following 
major aspects: = 


° Different types of processing elements are used. 
Imeother words, they are "heterogeneous." 


° The processing elements are loosely coupled. 


° The bandwidth of the intercommunication buses is 
relatively low. 





4. This Thesis Research 

The second part of this thesis research is to develop 
a multiple microcomputer system and to investigate its feasi- 
bility in implementing real time on-board signal/data process- 
maeetOr a smart sensor system. It 1s similar to a number of 
multiple microcomputer systems in development in the past 
three to four years which permit up to 16 microcomputers to 
be interconnected in some ways to perform computations. 
However, their objectives, architectures, intercommunication 
concepts, controllers, hardware buses and processing elements, 
meetware Operating system, etc. are quite different. 

This thesis project is presented by highlighting the 
following features: 

a. its objectives are to provide a multiple tasking 
system including fast image/signal processing capability and 
other more moderate speed but highly reliable signal/data 
processing capability for system management, command and 
Sontrol. 

b. Some of the signal/data processing tasks will be 
performed by tightly coupled processors. But the processors 
performing other tasks do not have to be all tightly coupled 
meeecner. Therefore, a mixed tightly and loosely coupled 
system is envisioned. 

c. A part of the system must perform some critical 
tasks which require ultra-reliability. Other parts of the sys- 


tem only require fail-soft and graceful degradation performance. 
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In any event, a dynamic allocation capability is required 
which allows flexible assignment of microcomputers to perform 
various tasks, which provides some fault tolerance. 

For these requirements, a multiple star/multiple 
cluster system of 16 bit microcomputers was developed. Its 
general concept and philosophy was developed by a top-down 
System design procedure which will be presented in the next 
Section, III.B. It will be explained how a choice was made 
considering several alternatives and seven important issues 
related to the system. In Section III.C, detailed implemen- 
tation of these choices will be presented by describing the 
Seameciples and circuits of this multiple microcomputer Scr en 
in five categories: 


System architecture os 
Processing resources 
Intercommunication network 
Intercommunication procedures 
Multibus communication. 


The performance of this system is described in Section III.E. 
B. DESIGN CONSIDERATIONS FOR THIS MULTIPLE 

MICROCOMPUTER SYSTEM 

ie Introduction 

Although only two large multiple microcomputer systems 

and one multiple minicomputer system have appeared in the 
literature and reached operational status, a large number of 
different architectures have been proposed and some are in 


the process of being implemented. The three operational 
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Eyotems are all from the Carnegie-Mellon University: CM* 
[172], Cvmp [191] and Cmmp [173]. There are now many options 
for the hardware and software design of a multiple microcom- 
puter system. 

This thesis took a top-down system design approach 
Mmemncach the choices made for the design of our system. This 
design process is presented in several steps in this section 
to explain the general idea and philosophy of this system. 

In the next section, the detailed design of various parts 
fee be described. 
te Architecture 

This thesis is primarily concerned with the imple- 
mentation of adaptive image processing. It is important, 
however, to realize that the adaptive filter is only one part 
of a longer end-to-end image processing program for detecting, 
tracking and recognizing targets in noisy images. The adap- 
tive spatial filter is used to enhance the target signal to 
noise ratio by suppressing the background clutter which may 
be enhanced by additional image processing techniques, such 
as the adaptive temporal filters. The clutter suppression 
stage is followed by thresholding, target acquisition, 
recognition and tracking stages. These signal processing 
Meerations are quite different. For example, adaptive spa- 
meal tilters require the computation of statistical image 
Characteristics, solving matrix equations. Adaptive threshold- 


ing requires the comparison and rearrangement of real numbers. 
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Marect acquisition usually involves pattern tests of numbers 
based on spatial, temporal and/or spectral information. 
Therefore, although each individual signal processing stage 
meamires real-time or fast execution speed, different signal 
processing stages do not depend on one another during process- 
mee Furthermore, it iS important to realize that processing 
Seeearget Signals for the mission objective is only one part, 
although a very important part, of the total signal/data pro- 
cessing requirements for the whole system. There are 
processing functions such as management, control, communica- 
tion and others which must also be implemented. The nature 
and requirements of their processing operations are quite 
Gifferent and vary over a wide range. Some do not need high 
processing speed but demand very high reliability. Others 

do limited computation but handle large amounts of data. In 
general, the signal/data processing requirements of many 
Systems cover a wide range. Therefore, we designed an archi- 
tecture which has several levels of coupling among processing 
eaements. 

At the £irst level, special processors may be directly 
meupled to a microcomputer. At the second higher level, sev- 
Seal microcomputers are connected to the same system bus in 
parallel and form a "cluster." A microcomputer can communi- 
See With any other microcomputer on the same bus or within 
the same cluster directly through common memory. It isa tight- 


ly coupled, bus oriented multiple microcomputer architecture. 
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tema higher level, the third level, four clusters are con- 
nected by way of a "complete star" bus switch network and 
form a "star.'' The communication of microcomputers between 
two clusters is accomplished by way of the switch network. 
Therefore, they are not as tightly coupled as microcomputers 
within a cluster because there will be more overhead in 
intercluster communication than intracluster communication. 
However, it was found that using specially designed control- 
lers for the intercluster communication, the access time was 
increased by only 9%. This data is presented in Section III.E. 
Therefore, we can consider that microcomputers in different 
clusters within the same star are still tightly coupled. At 
mmemiext higher level, the fourth level, several "stars" are 
connected together by linking nearest neighboring "stars" 
through a bus switch to form a "lattice network."' The inter- 
communication between microcomputers from two stars are sim- 
llar to that within a star, involving one central controller 
and two distributed controllers. The overhead is practically 
the same. Therefore, from the intercommunication viewpoint, 
microcomputers from two stars, and also throughout the systems, 
are practically all tightly coupled. However, through pro- 
Beaming, they may be used either in tight coupling, loose 
coupling or any combinations in between to suit the require- 


ments of the applications. 
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3. Intercommunication and Control 

Because of the hierarchical structure of the archi- 
tecture, the intercommunication processes and their controls 
are also hierarchical and are distributed. They are hier- 
archical because there are three levels of controls as shown 
mr lable III.1. 

At the lowest level of intracluster communication, 
no bus switch is needed. A Random Priority Controller (RPC) 
is used for arbitration. Only a small portion of the dis- 
tributed controller is used, mainly to check if requests 
Outside the cluster have been granted. At the next higher 
level of intercluster communication, the intrastar bus switch 
1s used. Arbitration is accomplished by both distributed 
Semtroller and RPC. Only a portion of central controller is 
used to grant the intercluster request. At the highest level, 
both interstar and intrastar bus switches may be used and all 
@emcrollers, central, distributed and random priority, are in 
full action. 

Further, the controllers are distributed because 
there are four identical RPC and distributed controllers, 
One in each cluster. Although there is only one central con- 
meoller, it consists of four identical units, one for each 
Siuster. Ihe advantages of such a distributed control system 
are: (1) Parallel control actions which enhance the speed of 
"request arbitration." (2) Improved fault tolerance because 


the control actions are shared between four separate units 
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and should one malfunction, the other three can still con- 
tinue their functions. 


4. Hardware Implementation of Controllers 


Controller circuits can be implemented in several 

ways: 

a. Microprocessor control 

See bit slice processor control 

See Digital logic circuit control. 
Two performance characteristics should be considered in their 
choice and design: programmability and speed. The micropro- 
cessor approach has the most versatile programmability but 
the slowest speed. The digital hardware approach has very 
limited programmability but the fastest speed of the three. 
fieesctimate has been made to compare their speeds. 

In our design, the primary goal is to offer the fast- 
est response and arbitration of requests and communication 
Speed. Therefore, we chose the digital logic circuit approach. 
Great care was given to the design of controller concepts and 
Circuits, to avoid unexpected changes. Further, Schottky and 
low power Schottky chips are used due to their speed and power 
trade-offs. CMOS chips were found to be too slow and do not 
have adequate driving capability. 

ee Priority Resolver 
ienwemane Several ways tO arbitrate multiple requests 


mea resolve priorities: 


OS 





Fixed priority a een) 


Rotating priority 
au O 
Random priority 

There are two primary requirements for a priority 
resolver circuit: uniform and fast resolution of bus re- 
Maests. In this system, an Intel Multibus 1s used as the 
System bus with 10 MHZ bus clock frequency. We decided to 
Seamron a Priority resolver circuit to arbitrate 8 SBCs within 
amen bus clock. 

The fixed priority approach was not selected because 
it was unable to arbitrate multiple bus requests and grant 
their usages uniformly. Test results showed that in our 
tightly coupled environments, two SBCs are able to share the 
bus adequately. More than two SBCs produce unacceptable 
delays. 

Rotating priority is much faster than the fixed pri- 
Ority approach. It is able to arbitrate multiple requests 
and does grant their bus usages uniformly. However, it was 
not our final choice because the random priority approach was 
found to be faster. This is because in the rotating priority 
meroach, every bus request line is tested serially (in a 
rotating manner) whether there are request signals on these 
lines or not. In the worst case, the rotating priority re- 
solver grants the bus after N searches, where N is the number 


of SBCs being arbitrated by the resolver. 
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PIrstm@nesirste out (FIFO) is a resolver approach 
which requires memory. Because of the time needed to refer- 
ence the memory, it is not possible to build a FIFO resolver 
tO arbitrate 8 bus requests within 100 nsec, the bus clock 
period. With current technology, a fast FIFO arbiter 
Bropably requires more than 300 nsec. 

The random priority resolver is designed based on 
M@aesDinary tree synchronous selector concept. Consider our 
Mease of 8 SBCs in a cluster. Three-stage selection is used. 
Hamane the first stage, four out of eight lines are checked 
Simultaneously. In the second stage, two out of these four 
lines are checked simultaneously again. The final bus grant 
ieemade in the third stage. In other words, the time for 
searching and resolving the bus requests is log,N, which is 
faster than the rotating priority resolver. Test results have 
shown that the random priority resolver is able to arbitrate 
multiple bus requests and grant their bus usages uniformly as 
memonstrated in Fig. 3.17. Four SBCs simultaneously sharing 
the bus in a tightly coupled environment are taken for the 
test case. These four SBCs were programmed to request the 
bus usage almost 100% of the time. The BPRN signals of these 
four SBCs are shown. A low signal of BPRN indicates that its 
Beets using the system bus. The fact that none of these 
four traces showed any long periods of bus usage or bus wait- 
ing demonstrated that the random priority resolver is able to 


arbitrate very heavy bus requests by these four SBCs and grant 
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bus usage to them "uniformly."" It was found that, on the 
average, a "bus request" is granted in about 60 nsec. 
6. Bus Switches 

Bus switches are one of the most important parts of 
a multiple microcomputer system because they provide the in- 
terconnection means among the processing resources. There 
are two aspects of the "bus switch" problem: bus switch 
network and the individual bus switch link. 

Many switch networks have been investigated, some 
predated the computer developments [195]. A small number 
fieeenem have been considered in the multiple microcomputer 
development: cross-bar, banyan, hyperconcentrator, simple 
ime, etc. 

A combined approach was selected including two levels 
of switching networks because of the consideration of multi- 
task signal/data processing requirements in a typical system. 
At the higher level, many stars are interconnected in a 
Meetace architecture. Interstar bus switches are provided 
meeween neighboring nodes. At the lower level, four clusters 
meewinciluded in each "star" node. They are interconnected by 
meeconplete star bus switch'’ network. The complete star 
Switch is chosen for two reasons. First, the coupling within 
mestar Should be as tight as possible. The complete star 
Switch allows us to connect two clusters by the shortest link. 
second, if a link failed, the complete switch gives us two 


Choices to connect two clusters by way of a third cluster via 


two links, thus providing some fault tolerance. 
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The important part of the individual bus switch link 
is the switches themselves. For the Intel Multibus, we found 
that 58 of the 86 lines should be switched. There are several 
choices for the switches: 


Bidirectional: MOS types of switches, such as CMOS, VMOS 
and DMOS. 


Unidirectional: Bipolar types such as Schottky, low 
power Schottky and ECL types; Optoelectronic types. 


Optoelectronic types of switches were not chosen because 

maey are Slow, on the order of 10 usec. Very fast switching 
@eeecds Ol the order of several tens of nanosec are required 
because today's Multibus is running at 10 MHZ which corre- 
sponds to a clock period of only 100 nsec. CMOS, VMOS and 
DMOS switches could provide such switching speeds. However, 
they do not have enough driving capabilities for the 15 ma 

Or more required by many of the control and address signals 
of the microcomputer. Therefore, these MOS switches were not 
chosen, although their bidirectional feature and the low power 
Characteristics of the CMOS switches are extremely attractive 
and reliable. We chose the low power Schottky switches be- 
mse Of their speed and driving capability. A typical per- 
formance is shown in Fig. 3.15 which shows the waveform of an 
address signal before and after the switch. It can be seen 
Maree not only 1s the delay short, on the order of 25 nsec, 
but also the waveform is improved by the switch because of 
its good driving capability of up to 50 ma. It was tested 


With a minimum load resistor of 50 ohms and maximum capacities 
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Seay 0 pt and the switch continued to function satisfactorily 
up to 45 MHZ. One disadvantage is the need to use two back- 
to-back switch circuits for a bidirectional switching of each 
Signal. Therefore, a special circuit was designed to provide 
Mereonly the “enable” signal but also the "direction." 
fee Erocessing Elements 

There are two major types of processing elements on 
the system bus: general purpose microcomputers and special 
purpose processors which can further be separated into two 
subcategories. One iS a special purpose processor like an 
array processor which can perform several signal processing 
Operations such as fast Fourier transform, correlation, convo- 
Mitton, finite impulse filtering, infinite impulse filtering, 
etc. The second type is a special purpose processor which is 
designed to perform only one signal processing operation such 
as FFT. 

a. General Purpose Microcomputer 

It was decided that all general purpose microcom- 

puters used in our system should be treated homogeneously. 
This is necessary because two major principles of our operat- 
ing system are based on the "'virtual processor" [189] and 
"dynamic process allocation" [190] concepts which require 
homogeneous processing elements. 

b. Special Purpose Processors 

It was decided that special purpose processors 


could not be treated in the same way as the microcomputers. 
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mewever, it has not been decided at this time exactly how 
@mese Special purpose processors should be handled. There 
are two important alternatives. In one case, a special pur- 
pose processor is treated as an I/O port managed by the 
operating system. In the other case, a special purpose pro- 
cessor can be operated in a ''slave'" mode on the system bus. 
8. Mode of Data Transfer 

The basic mode of data transfer in most of the mul- 
baple processor systems is based on the "message transfer" 
communication. However, a basic philosophy of our operating 
System is the "loop free" structure which requires frequent 
Synchronization primitive references. In other words, the 
Meetacting system program on a microcomputer needs to refer- 
ence synchronization primitives located in either internal 
Or external global memories. These "references" are executed 
Via the system bus. If the data transfer is "message" based, 
the synchronization of processes could be delayed because 
the system bus is being occupied by a long message transfer. 
In order to avoid such a delay, it was decided that the basic 
mode of data transfer should be based on the "word transfer." 
This allows several microcomputers to reference their synchro- 
nization primitives and other data in an "interleave" mode. 

However, the transfer of data in "blocks" is possible 
if required. This is accomplished by a special feature of 
the Intel 16 bit 8086 microprocessor which can generate a bus 


lock signal of a duration specified by software. This bus 
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lock signal holds the bus for the completion of the block 
transfer. Thus, data transfer by "messages based communica- 


mom 215 possible as well. 


Eee SCRIPIION OF THIS MULTIPLE MICROCOMPUTER SYSTEM 
iP lnc roduction 
In the last section, we have presented the reasons 

memechoosing the specific approaches for various parts of 
our multiple microcomputer system based on a top-down design 
Mmrecedure to meet the requirements of this type of smart 
Bemsor systems. In this section, more detailed description 
mere be piven to explain how those choices are implemented. 
The presentation will be made in five major categories: 

mestem architecture (Section C.2) 

Processing resources (Section C.3) 

Intercommunication network (Section C.4) 


Intercommunication procedures among resources 
in different clusters and stars (Section C.5) 


Multibus communication (Section C.6) 

Performance of this multiple microcomputer system 

mee be presented in Section D. 
Mmeeeeoyotem Architecture 

The topology of this system consists of many "Star" 
hodes interconnected by links to nearest neighbor stars. A 
two dimensional example is shown in Fig. 3.1. Each star has 
four links connected to its four neighbors. The links are 


bidirectional system buses with a bus switch, called 
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Mmcer-Star bus switch" (ISBSW). The "bus switch" consists 
of 60 bidirectional switches for 60 signal lines. Two types 
of switches have been investigated: one with latches and 
one without latches for the signal lines. 

P2mucweam SCONSIStS Of TOUT Clusters interconnected 
by a complete star "bus-switch network." Each "cluster" 
Consists of up to eight microcomputers. Other processing 
elements and one or more RAM boards are also connected onto 
mmemoystem Multibus. Fig. 3.2 depicts the topology of a 
Bimete Star with four clusters. In this example, the bus 
switch network consists of six bidirectional system buses, 
Saemewith a bus switch interconnected as shown in Fig. 3.7. 


me Frocessing Resources 


Two types of processing resources are used in this 
system. 


a. Basic Processing Elements - SBC 8612A 

inGelm=mlOonbrc single board microcomputers, SBC 
mozn, are used as the basic processing elements. A block 
diagram of the SBC 8612A is shown in Fig. 3.3. 

(1) The Single Board Microcomputer SBC-8612A. 
The iSBC 8612A Single Board Computer is a 16 bit single board 
Gomputer, a complete computer system on a Single printed cir- 
Maat assembly. The iSBC 8612A board includes a 16 bit central 
processing unit (CPU) up to 32K bytes of dynamic RAM, a serial 
communications interface, three programmable parallel I/0 


Merts, three programmable timers, priority interrupt control, 
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Multibus interface control logic, and bus expansion drivers 
for interface with other Multibus interface-compatible expan- 
Sion boards. Also included is dual port control logic to 
allow the iSBC 8612A board to act as a slave RAM device to 
other Multibus interface masters in the system. Provision 
meemade for user installation of up to 16K bytes of read 

only memory. 

The 1SBC 8612A Single Board Computer is 
controlled by an Intel 8086 16 bit microprocessor (CPU). 

Mae o0SO6 CPU includes four 16 bit general purpose registers 
@iat may also be addressed as eight 8 bit registers. In 
madition, the CPU contains two 16 bit pointer registers and 
mento Dit index registers. Four 16 bit segment registers 
allow extended addressing to a full megabyte of memory. The 
CPU instruction set supports a wide range of addressing modes 
and data transfer operations, signed and unsigned 8 bit and 
16 bit arithmetic including hardware multiply and divide, and 
logical and string operations. The CPU architecture features 
dynamic code relocation, reentrant code, and instruction look- 
ahead. 

The iSBC 8612A board has an internal bus for 
all on-board memory and I/O operations and accesses the system 
bus (Multibus interface) for all external memory and I/O oper- 
ations. Hence, local (on-board) operations do not involve 
the Multibus interface making the Multibus interface avail- 


able for true parallel processing when several bus masters 
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(e.¢g., DMA devices and other single board computers) are 
feed 1n a multimaster scheme. 

Dialeeort Controlemocic 15s included to 
interface the dynamic RAM with the Multibus interface so 
that the iSBC 8612A board can function as a slave RAM device 
when not in control of the Multibus interface. The CPU has 
BetOrity when accessing on-board RAM. After the CPU com- 
pletes its read or write operation, the controlling bus mas- 
ter is allowed to access RAM and complete its operation. 
Where both the CPU and the controlling bus master have the 
need to write or read several bytes or words to or from on- 
board RAM, their operations are interleaved. For CPU access, 
the on-board RAM addresses are assigned from the bottom up 
of the 1 megabyte address space; l.e., 00000-07FFF,,. The 
Slave RAM address decode logic includes jumpers and switches 
to allow positioning the on-board RAM into any 128K segment 
of the 1 megabyte system address space. 

The slave RAM can be configured to allow 
either 8K, 16K, 24K, or 32K access by another bus master. 
If the iSBC 300 Multimodule RAM option is installed, the 
memory increments are 16K, 32K, 48K, or 64K. Thus, the RAM 
can be configured to allow other bus masters to access a 
segment of the on-board RAM and still reserve another segment 
strictly for on-board use. The addressing scheme accommodates 


both 16 bit and 20 bit addressing. 





Foul Gusockets are included to accommodate 
memeo LOK bytes of user-installed read only memory. Config- 
uration jumpers allow read only memory to be installed in 2K, 
mo. Or SK increments. 

The iSBC 8612A board includes 24 program- 
mable parallel I/O lines implemented by means of an Intel 
mes Programmable Peripheral Interface (PPI). The system 
software is used to configure the I/O lines in any combina- 
tion Of unidirectional input/output and bidirectional ports. 
The I/O interface may be customized to meet specific periph- 
eral requirements and, in order to take full advantage of the 
large number of possible I/O configurations, IC sockets are 
provided for interchangeable I/O line drivers and terminators. 
Hence, the flexibility of the parallel I/O interface is fur- 
ther enhanced by the capability of selecting the appropriate 
combination of optional line drivers and terminators to pro- 
Vide the required sink current, polarity, and drive/termination 
characteristics for each application. The 24 programmable 
I/O lines and signal ground lines are brought out to a 50 pin 
edge connector (Jl) that mates with flat, woven, or round 
Sable. 

The RSZ32C compatible serial I/O port is 
controlled and interfaced by an Intel 8251A USART (Universal 
Synchronous/Asynchronous Receiver/Transmitter) chip. The 
USART is individually programmable for operation in most 
Synchronous or asynchronous serial data transmission formats 


(including IBM Bi-Sync). 
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Invthe synchronous mode the following are 


programmable: 
ae Character length, 
Beeeeoyne Character (or characters), and 
fee rarity. 


In the asynchronous mode the following are 


programmable: 
fmeechiaracter length, 
mumecaia rate factor (clock divide ratios of 1, 16, or 64), 
Eee otop bits, and 
fee Parity. 


In both the synchronous and asynchronous 
modes, the serial I/O port features half- or full-duplex, 
double buffered transmit and receive capability. In addi- 
Meme USART error detection circuits can check for parity, 
@yerrun, and framing errors. The USART transmit and receive 
clock rates are supplied by a programmable baud rate/time 
generator. These clocks may optionally be supplied from an 
external source. The RS232C command lines, serial data lines, 
mueesignal ground lines are brought out to a 50 pin edge con- 
nector (J2) that mates with flat or round cable. 

Three independent, fully programmable 16 bit 
interval timer/event counters are provided by an Intel 8253 
Programmable Interval Timer (PIT). Each counter is capable 
of operating in either BCD or binary modes; two of these 
counters are available to the system's designer to generate 
accurate time intervals under software control. Routing for 


the outputs and gate/trigger inputs of two of these counters 
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may be independently routed to the 8259A Programmable Inter- 
mape Controller (PIC). The gate/trigger inputs of the two 
@ouneers may be routed to [/0 terminators associated with 
MmenSZ255A PPI or as input connections from the 8255A PPI. 
The third counter is used as a programmable baud rate gener- 
Meonretor the serial I/O port. In utilizing the iSBC 8612A 
board, the systems designer simply configures, via software, 
each counter independently to meet system requirements. 
Whenever a given time delay or count is needed, software 
@emmands to the 8253 PIT to select the desired function. 

ime contents of each counter may be read at any time during 
System operation with simple operations for event counting 
applications, and special commands are included so that the 
contents of each counter can be read "on the fly." 

The iSBC 8612A board provides vectoring for 
bus vectored (BV) and non-bus vectored (NBV) interrupts. An 
on-board Intel 8259A Programmable Interrupt Controller (PIC) 
Gamales up to eight NBV interrupts. By using external PICs 
meaveda to the on-board PIC (master), the interrupt structure 
mameepe expanded to handle and resolve the priority of up to 
64 BV sources. 

The PIC, which can be programmed to respond 
memeace-sensitive or level-sensitive inputs, treats each 
“true" input signal condition as an interrupt request. After 
resolving the interrupt priority, the PIC issues a single 


interrupt request to the CPU. Interrupt priorities are 
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independently programmable under software control. The 
programmable interrupt priority modes are: 

(a) Nested Priority. Each interrupt 
request has a fixed priority: input 0 is highest, input 7 
is lowest. 

(b) eoulemected Priority. This mode is 
the same as the nested mode, except that when a slave PIC is 
being serviced, it is not locked out from the master PIC 
priority logic and when exiting from the interrupt service 
routine, the software must check for pending interrupts from 
the slave PIC just serviced. 

(c) Auto-Rotating Priority. Each interrupt 
request has equal priority. Each level, after receiving 
service, becomes the lowest priority Pomecls unt iiescne mex 
mecerrupt occurs. 

(d) Specific Priority. Software assigns 
Mewest priority. Priority of all other levels is in numer- 
ical sequence based on lowest priority. 

(e) Special Mask. Interrupts at the level 
Deing serviced are inhibited, but all other levels of inter- 
rupts (higher and lower) are enabled. 

(£) Pell. The CPU internal interrupt 
enable is disabled. Interrupt service is achieved by pro- 
grammer initiative using a Poll command. 

The CPU includes a non-maskable interrupt 


(NMI) and a maskable interrupt (INTR). The NMI interrupt is 
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intended to be used for catastrophic events such as power 
outages that require immediate action of the CPU. The INTR 
micerrupt is driven by the 8259A PIC which, on demand, pro- 
vides an 8 bit identifier of the interrupting source. The 
Seuemultiplies the 8 bit identifier by four to derive a 
pointer to the service routine for the interrupting device. 
Interrupt requests may originate from 18 
wemrees without the necessity of external hardware. Two 
mumper-selectable interrupt requests can be automatically 
generated by the Programmable Peripheral Interface (PPI) when 
a byte of information is ready to be transferred to the 8086 
meue(i.€., input buffer is full) or a byte of information has 
been transferred to a peripheral device (i.e., output buffer 
is empty). Two jumper-selectable interrupt requests can be 
automatically generated by the USART when a character is 
ready to be transferred to the 8086 CPU (1i.e., receive channel 
buffer is full) or when a character is ready to be transmitted 
me., transmit channel data buffer is empty). A jumper- 
selectable interrupt request can be generated by two of the 
meocrammable counters and eight additional interrupt request 
memes are available to the user for direct interfaces to 
Meer designated peripheral devices via the Multibus interface. 
wme interrupt request line may be jumper routed directly from 
Meeeripheral via the parallel 1/0 driver/terminator section 
and one power fail interrupt may be input via auxiliary 


connector P2. 





The 1SBC 8612A board includes the resources 
for supporting a variety of original equipment manufacturer 
system requirements. For those applications requiring addi- 
mionmal processing Capacity and the benefits of multiprocessing 
(i.e., several CPUs and/or controllers logically sharing 
systems tasks with communication over the Multibus interface), 
the iSBC 8612A board provides full bus arbitration control 
logic. This control logic allows up to three bus masters 
(e.g., combination of iSBC 8612A board, DMA controller, 
diskette controller, etc.) to share the Multibus interface 
in serial (daisy-chain) fashion or up to 16 bus masters to 
mgare the Multibus interface using an external parallel pri- 
Oority resolving network. 

The Multibus interface arbitration logic 
operates synchronously with the bus clock, which is derived 
either from the iSBC 8612A board or can be optionally gen- 
erated by some other bus master. Data, however, are trans- 
ferred via a handshake between the controlling master and the 
addressed slave module. This arrangement allows different 
speed controllers to share resources on the same bus, and 
transfers via the bus proceed asynchronously. Thus, the 
transfer speed is dependent on transmitting and receiving 
devices only. This design prevents slower master modules 
from being handicapped in their attempts to gain control of 
the bus, but does not restrict the speed at which faster 


modules can transfer data via the same bus. The most obvious 
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applications for the master-slave capabilities of the bus 
are multiprocessor configurations, high speed direct memory 
access (DMA) operations, and high speed peripheral control, 
menare by no means limited to these three. 

Adding the optional iSBC 300 Multimodule 
RAM to the iSBC 8612A board, allows the on-board RAM to be 
expanded by 32K (for an on-board total of 64K). If the 
optional iSBC 340 Multimodule EPROM is installed on the iSBC 
8612A board, the amount of on-board ROM/EPROM can be expanded 
maeeok (for an on-board total of 32K). 

b. Special Processing Elements 

Special purpose processing elements will also 
be used in this system to enhance processing capabilities. 
Typical examples are array processors, FFT, correlators, etc. 
However, they have not been included in this thesis project. 

c. Memories 

Three types of memories are provided. 

(1) Secondary Memory. It consists of two mag- 
netic cartridge hard discs and a dual drive floppy diskette 
system. The magnetic hard disc is manufactured by the DYNEX 
Company and has a storage capacity of 10 megabytes. This 
Benedisc system 1s connected to the system Multibus, thus 
allows fast data transfer rate and has DMA capability. Its 
micertace to the Multibus is made by the Interphase Corp. 

The dual floppy diskette drive is a part of the Intel MDS-220 


development system. 
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(Zi PnimaryeMemory. It consists of dynamic 
RAM and EPROM (Erasable Programmable Read Only Memory). The 
EPROMs reside in each SBC (8K byte to 16K byte per SBC). It 
can be used as the monitor storage, and to store part of the 
operating system. The RAMs reside in two types of physical 
Meecations. The first location is on each SBC and has a 
Capacity up to 64K bytes. The second type of location is on 
separate RAM boards. A 128K byte RAM board developed by the 
MUPRO Company is used. The RAM in the SBC is a dual ported 
RAM which can be shared with other SBCs via the Multibus 
interface. Part or all of the dual ported RAM can be made 
accessible only to the on-board CPU; in other words, made 
"private" and "unshared" to the SBC. The stand-alone RAM 
boards are shared with other SBCs via the Multibus interface. 

d. Memory Hierarchy 

The primary memory of this type is partitioned 

m@eeording to the following hierarchical scheme. 


A) Private Unshared Memory - RAMs available on each SBC 
which can be accessed only by the on-board CPU. 


B) Internal Global Shared Memory - Internal global 
Shared RAM available on each SBC and special RAM 
boards. The on-board RAM in the SBC is a dual 
ported RAM and can be accessed by any SBC which 
is a member of that cluster (unaccessible to PE 
in other clusters). See Sections C.35.a.18 


meexternal Global Shared Memory - External global 
Shared RAMs reside in special RAM boards and/or in 
dual ported RAM of the SBCs. These memories can be 
meeessed DY any SBCs in the same "star," and any 
SBCs in the corresponding clusters in neighboring 
Scars. 





Using this memory hierarchy, the total address 
Space can be expanded from the physical memory address space 
of each CPU. The 8086 microprocessor has 20 address lines so 
its physical address space is Cas = 1,048,576 bytes, or 
1M bytes. 


In this implementation, the total address space 


miemory space) tor a single star is partitioned in the follow- 


ing way: 
(1) Private Memory 
6 uC in each cluster 8 uC in each cluster 


2° 65,536+4 + (65,536 - 8,192) 4+ 64K +4 ~ (64K - 8K) 


= 360,448 bytes/cluster 480K bytes/cluster 


2 + 64K +4 + (64K - 8K) 


491,520 bytes/cluster 


= 352 Kbytes/cluster? 


(2) Internal Global Memory 


eeuc/CL se, CL 
1 M bytes ° 2 1 M bytes - 2 


= 768K byte/cluster 768K byte/cluster 


= 786,432 bytes/cluster = 786,432 bytes/cluster 


(3) External Global Memory 


oewG/ CL Seal 
32K byte/cluster 32K bytes/cluster 
= 32,768 bytes/cluster = 32,768 bytes/cluster 


AS described before, a "Star" consists of four clusters, 


thus the total memory space for a single star is: 


51k bytes = 1024 bytes. 
ES 5 





ey C/ CL 8 wC/CL 
feet soZ2ZK + 768K + 32K) 4+ (480K + 768K + 32K) 


4,608K bytes/star 5,120K bytes/star 


4,718,592 bytes/star 


Hv 
i 


5,242,880 bytes/star 


iis eXpanded memory space can be determined in general as: 
Mom = Memory space 
CL = Number of clusters in a "star" 
PM = Private memory. In K bytes. 
GIM = Global internal memory. In K bytes. 
GEM = Global external memory. In K bytes. 
N = Number of SBCs. 


N 
MS = CL + y PM, + GIM + GEM err) 
i=l 


If all SBCs are assigned the same amount of private memory, 


then (3.0) becomes 


MS = CL - (N + PM + GIM + GEM) (3.1) 


The reason for computing the memory space for 6 microcon- 
mucers and for 8 microcomputers in a cluster 1s mainly 
because of power supply considerations. The available power 
Supply can handle up to 6 SBCs in a cluster. However, the 
Controller for intercommunication is designed for 8 SBCs. 

4. Intercommunication Network 

In order to establish fast, reliable and high 

of fault toleran communication among SBCs of different 


Clusters and stars, three level communication controllers 
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were designed, built and tested. They include a combination 
of random priority, distributed, and central controllers as 
Brownian Figw 544 for a single star. Each cluster has its 
own distributed controller. Each star has four such control- 
lers. The four clusters share one central controller. The 
four distributed controllers are identical, and have some 
degree of programmability... 
a. Distributed Controllers (DC) 

A block diagram of the distributed controller is 
depicted in Fig. 3.5. It resides on a single board located 
in each cluster. Its primary functions are the following: 


1) Arbitration among Internal/External bus requests 
from within and outside the cluster. 


eierPriority resolving. 
3) Inter-cluster advance activities monitoring. 
meeinteracting with the central controller. 


5) Deadlock avoidance. 


Dee Randoms Priority Controller (RPC) 
The RPC is a bus contention resolver based on 

a binary tree approach. The RPC accepts up to eight "Bus 
Requests" (BREQ) and issues a single "Bus Priority In" (BPRN) 
Seemal. BREQ is a signal generated by the bus arbiter which 
Mmesides on-board the SBC to indicate that this particular SBC 
mequwires the control of the cluster system bus (Multibus) for 
One or more data transfers. BPRN is a signal generated by 


the RPC to indicate to the requesting SBC that control of the 
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cluster bus is granted. Prior to issuance of a BPRN, the 

RPC generates an "advanced bus priority in" signal (intra- 
cluster advance pe tivities monitor BPRN*) which is sent to 
the ICAAM as a "port selector" signal. This signal starts 

a chain of logical activities which eventually causes the DAC 
(deadlock avoidance circuit) to send two signals, i.e., BHD 
(bus hold) and PRE (priority enable) to the RPC. When the 
appropriate BHD and PRE are received by the RPC, it will 
generate the BPRN signal. BHD is a positive logic signal 
which enables the tristate output of the RPC to allow BPRN* 
to propagate and become a BPRN signal, when the PRE signal 

is enabled. If BHD goes low, it disables all PRN*. PRE is 
memeodtive logic Signal which is generated in the DAC circuit. 
When the PRE signal is generated, it disables requests from 
memer Clusters and enables the output driver of the RPC to 
Sena the BPRN. 

The RPC has an internal clock to synchronize its 
@rDitration function. More details can be found in Section 
na. b. 

ICAAM (Intra-Cluster Advance Activities Monitor) 
has a multiplexer which selects two signals, MSBT (most 
Significant address bits, 5 bits out of 20) and ADRDC/ADWTC 
(advance read command/advance write command) when a BPRN* 
meereceived from the RPC. By analysing the MSBT, the ICAAM 


generates a bus request of one of the following types: 


ro 0 





Much clLUSteGebuS Tequest, Jt 15 a request for the 
yoreieouominm tic same Cluster only. In response 
to this request, the ICAAM generates a IREQ signal. 

Mmeminter-cluster bus request. It is one out of four 
cluster requests generated by the ICAAM of the 
mceidbuted Conuroller. Each CLREQ* requests 
three resources: one system bus of the requesting 
cluster, one system bus of the requested cluster 
and one inter-connecting bus switch. Following a 


MeReOc, tne ICAAM also creates an EXREQ for the CIC 
Feeincidence inhibit circuit). 


3) Inter-star bus request. This request, labeled 
STREQ*, involves three resources: the system bus 
Sea cluster im the requesting Star, the system 
bus of the corresponding cluster in the requested 
Star, and the inter-connecting bus switch between 
these two stars. Following a STREQ* signal, the 
MmECAAM aiso creates an EXREQ for the CIC. 


The ICAAM also generates an advanced read command 
(ADRDC) or advance write command (ADWTC) before the corre- 
sponding read command (MRDC) or write command (MWTC) is 
generated by the bus controller of the requesting SBC. This 
is done by monitoring the activities of the CPU of the re- 
Questing SBC before the CPU grants the system bus. Those 
Signals are needed to determine the direction of the drivers 
in the bus switch in advance, so that all switching transients 
are settled before a data transfer takes place. 

Clemicenmncidencemlunibiter Circuit) - The CIC 
meeepts five Signals as inputs: one STPRN (Star priority in), 
mmeee  (CluUSter priority in) from the central controller and 
one IREQ/EXREQ from ICAAM,. It generates one output signal 
INH (inhibit) for the DAC (deadlock avoidance circuit). The 


primary function of the CIC is to inhibit a BPRN from the RPC 


eo 





mmease that a CLREQ* or STREQ* were issued by the ICAAM, 
Mntil either a CLPRN or a STPRN is granted by the central 
controller to the CIC. The necessity of this signal INH 

is to prevent the system bus to be tied down in waiting until 
the inter-cluster request is granted and allow efficient bus 
usage and reduce bus contention. 

DAC (Deadlock Avoidance Circuit). A "deadlock" 
is a situation in which two processes are unknowingly wait- 
ing for resources that are Aen by Caéh other and thus un- 
available [192]. More details can be found in Section C.5.d.,e. 
iieeprimary function of the DAC is to prevent deadlock. Its 
mariciple is Similar to the "Suspend" Lock method [Ref. 193]. 
The DAC accepts four input signals: ANREQ (any request), 
DM, STREQ, CLREQ and generates three signals: BHD (bus 
hold), PRE (priority enable) and CL/STPRN. Three cases will 
be described to explain the operations of DAC depending on 
the occurrence of either the CLREQ (or STREQ) and the INH 


Signals. 


(asenl jes CLRE® (or STREO) occurs prior to 
the INH signal, the CL/STPRN signal will be granted. In this 
case, BHD will go low and PRE high, thus freezing the selected 
request in the RPC, disabling the BPRN* which will release 
all the resources held by the appropriate SBC via the BPRN* 
Signal (ICAAM, CCU-I). About 30 nsec later, a CL/STPRN 
will be generated by the DAC. This allows the appropriate 


Processing element to grant the system bus. 


HO” 





[easece econ CLRE@ (or STREQ) signal occurs 
after the INH signal, the CL/STPRN signal will be blocked. 
It indicates that the system bus is in use. In this case, 
BHD is high and PRE goes low, BPRN will be granted. 

(Case 3) - If the INH signal and CLREQ (STREQ) 
Signal occur simultaneously within a time window of 15 nsec, 
few cLREQ (Or STREQ) signal will be blocked as before. In 
case of any occurrence of a transient CL/STPRN signal, the 
"GLITCH KILLER" will suppress it and prevent the transient 
from propagating to the central controller. 

c. Central Controller (CC) 

The central controller is a single board control- 
ler, which consists of two clocks and four identical units, 
each corresponding to one cluster in the star. The primary 
manmetions of the CC are: 


1) To arbitrate among different CLREQ and STREQ to a 
Single cluster. 


2) Enable and disable the CL/STPRN signal chain. 


3) Enable and disable the appropriate bus switch links 
of the complete star switch. 


mmeock diagram of the CC is presented in Fig. 3.6. 

CG e LOC sethiesmain clock of the central 
Bemeroller, its frequency is 30 MHZ. It is used to synchro- 
nize and enable the arbitration function of the CSRA (cluster/ 
Star request arbitor) and the four-phase clock, CLK-2. 

CC aa Olscice tomer our-phase, anti-coincidence 


Clock. Its input is CLK-1 which generates four clocks, one 
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Figure 3.6 A block diagram of the central controller. 


194 





each for four CSRAs. The functions of the four-phase clock 
ace 
mero Synchronize the CLREQ (or STREQ) chain action via 
the CSRA in order to prevent deadlocks. The deadlock 
avoidance method used in this implementation is similar 


momene “spinning lock'’ method {192]. The spinning 
lock is rotating at a frequency of 3.75 MHZ (30/8 MHZ). 


Gohan (Cluwster/otar Request Arbiter) - The CSRA 
1S a rotating priority resolver. Its primary functions are: 
1) To arbitrate among requests from three other clusters 

within the same star and from the corresponding 
cluster in the neighboring star. 
Z2) To enable the selected request, after being synchro- 


mized with the spinning lock, to propagate to the 
medquested cluster. 


The CSRA accepts four different requests to a single cluster 
and grants one of them according to a rotating priority scheme. 

PouemeelusS ter ocar Priomity Invinable) - the CSPE ° * 
is a demultiplexer whose primary function is to enable the 
CL/STPRN chain action. The CSPE is synchronized by the CSRA. 
When a CLPRN is received from the requested cluster, the CSPE 
will enable the CLPRN chain action to the selected requesting 
Euuster. 

SObG@motareoawitch Enable Circuit) - The SSEC 
Bemsists of a set of six drivers. It accepts the different 
CLPRNs and generates two signals, ECC, DIR, DIR. ECC is a 
negative logic signal which enables one of the bus switch 
links corresponding to the CLPRN signal. DIR is a signal 
which sets the requesting direction of the drivers in the 


selected link of the "complete star" bus switch. DIR is 


ie 





the inverted DIR signal. The SSEC is responsible for the 
enabling of the six different links of the complete star bus 
Pmech as depicted in Fig. 3.7. 

eee intercommuntcation Procedures Among Resources 

Communication among the resources of this system is 
governed by the following basic concepts: Explicitly seg- 
mented memory; unshared local and shared global internal/ 
external memory hierarchy, asynchronous process structure and 
a design decision that each single board computer is allowed 
to use the system bus for transfer of only one word of data 
and then must release the system bus to other SBCs except 
when a prefix lock is executed by software. A software lock 
Will grant the bus to that SBC for any length of time needed 
by that SBC. In general, this feature is not required fre- 
quently so the operating system will not normally be delayed 
Meeting tor the system bus to be released in order to test a 
semaphore, or any other synchronization primitives. 

In order to provide effective communication among all 
M@GocessSing elements (within a single cluster, among different 
clusters in a single "star," and among 'Stars'"') and to arbi- 
trate the contention of bus usage (in star bus switch and 
inter-star bus switches), we have developed an intercommuni- 
cations system managed by distributed and central controllers, 
as described in Chapter III.D.4.,5. 

In order to describe the communication protocol among 


different SBCs, a two ''star'' system is chosen - STAR-1, STAR-2 
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memaepicted in Fig, 3.8. Several examples of different 
types of communication are presented. 
a. Example #1 - Intra-Cluster Communication 

Intra-cluster communication is accomplished by 
means of data transfer via the cluster Multibus. This type 
of communication does not involve the central controller or 
enyebuS Switch. The distributed controller resident in the 
Seeciric cluster and on-board SBCs are the controllers of 
this communication link. 

For example, let us assume SBC-1 in cluster Al 
requests some information from SBC-2 in the same cluster. 
The sequence of events (Fig. 3.9) is: 

a) SBC-1 generates BREQ signal. 


b) The RPC of the distributed controller will grant 
the request and generates a BPRN* signal. 


c) The ICAAM of the distributed controller will 
ymememaec an [REQ signal, for the inhibiter. 


fer con cone [REQ, the "IHC" generates an inhibit 
Signal which causes the DAC to send appropriate 
BHD and PRE signals. 


e) These two signals are sent to the RPC to close the 
chain and a BPRN is generated. 


f) The BPRN signal is applied to the arbiter circuit 
of the corresponding SBC. From this point, a 
TeculayeMultibus transfer is executed. 
These six events are necessary to establish any 
iiera-clustér communication. But they are not sufficient. 


The following conditions corresponding to the requests from 


Other clusters and stars must be examined: 
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lo cicrcedmgGhenecluSter in process of communica- 
PPOnaw tthe ties cluster ? 


meets there any Cther star in process of communication 
men chisee luster 7 


For simplicity of this example, we assumed that 
mo external requests were involved in the process of intra- 
Ziuscer communication. 

Upon termination of the data transfer via the 
system bus, SBC-1 releases its BREQ signal which releases 
miesources held by SBC-1. The average time of word transfer 
is 1.65 usec. 


DEX anolem™e -— Inter-Cluster Communication 
(within a Star) 


Inter-cluster communication is accomplished by 
means of data transfer via two clusters' system buses (Multi- 
bus) and the bus switch interconnecting those two clusters. 
This type of communication involves all controllers, the star 
Base switch, and the on-board SBC arbiter. (See Fig. 3.10). 

Assume that SBC-1 in cluster Al requests some 
information from SBC-1l in cluster Bl. The sequence of events 
cS : 

imeeeobC-1 of Al generates BREQ signal. 


2) The RPC of the distributed controller in cluster Al 
locks on the request and generates a BPRN* signal. 


fee the BPRN* signal iS appiied to the ICAAM of the 
distributed controller. 


4) The ICAAM generates two signals: CLREQ-B1, which 
puOpalatess tothe ToOtating priority arbiter of the 
Sevcr ecomenottern unre B and "“EXREOQ"' which is 
PS OricimeOmuncm ClO sGoincidence inhibiter of the 
distributed controller of cluster Al. 
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8) 
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10) 


or) 


12) 


13) 


iim Gl Scoincidence inhibiter generates an appro- 
priate INH signal which will cause the distributed 
Sontrollenean cluster A to wait for a CLPRN from 
Eiewdenulciplexer Ot the central controller, unit B. 


The "cluster/star request arbiter" in the central 
controller locks on the CLREQ-Bl signal and waits 
former neespinning Lock to enable the CLREQ chain 
action and locks on the request. 


mie we CURE@ signal 1s applied to the DAC of the dis- 
tributed controller of cluster Bl. 


The DAC of the distributed controller of cluster Bl 
generates a CLPRN signal which is applied to the 
demultiplexer of unit B of the central controller. 


The central controller enables the CLPRN signal to 
the "DAC" of the distributed controller in cluster 
A which generates appropriate BHD and PRE signals. 


The BHD and PRE signals are applied to the ROC and 
closes the chain action. The RPC then generates 
the BPRN signal. 


The BPRN signal is applied to the on-board SBC-1 
arbiter which starts the regular Multibus communi- 
Cation. 


After the event #9, a parallel process is initialized. 
This process is the bus switch enable. Two signals, 
DIR and ECC, are sent to the bus switch which links 
meme buses of cluster Al and cluster Bl. 


ii@sestwo Sl@mdls prepare the switch for the coming 
data transfer. 


The initialization of the bus switch terminates 


200 nsec before the transfer of data via the bus (switch). 


This feature makes the bus switch transparent to the request- 


ing cluster, and both clusters are linked on a longer system 


Bus tor the time the transfer takes place. SBC 1 in cluster 


Al can use the "longer" system bus (two system buses and the 


plus switch) for more than one word transfer, if this feature 





1s requested by a software bus lock instruction from SBC l. 
Termination of this process is started by releasing the BREQ 
meomoeby SoC-leof Cluster Al. This event releases all 
meeournces held by SBC 1 of cluster Al. 

The sequence of events described in this example 
is necessary for this type of communication. Other external 
events were not introduced in order to simplify the example. 
This sequence of events takes place in an average time of 
Berl Sec. 

c. Example #3 - Inter-Star Communication 

Inter-star communication is accomplished by means 
of data transfer via the system buses of two clusters and the 
bus switch interconnecting these two clusters. This type of 
communication involves all controllers, and the bus switch 
mieerconnecting the two clusters. The sequence of events is 
Similar to the previous example. Instead of the CLREQ signal, 
a STREQ signal is applied to the central controller. The 
responding signal is STPRN. (Scemmarg a) 5 11): 

Examples 1, 2, and 3 described a case of separable 
communication levels. In a real application, the situation 
can be more complicated. For example, a simultaneous com- 
meidgeton Of the three different examples is possible. In 
such a case, deadlocks could occur frequently [193]. In 
order to prevent those deadlocks, two methods of deadlock 


avoidance are used. 
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JSuspenameoen == ihis method 15 implemented in 
the DAC of the distributed controller. In order to explain 
how this method works, the following example is used. 


d. Example #4 - Deadlock Avoidance [ - 
Suspend Lock 


Sete eclisuer lore star ) requests SBC-j in 
Cluster A2 of star 2 (process Pl, and SBC-k in cluster A2 of 
Star-2 requests SBC-2 in cluster Al of star 1 (process P2), 
moed,},k,2>1}. Let's assume that in time ile the two request 
Mmmeeesses Pl and PZ progress to state No. 3 (Fig. 3.12). 
At this point of execution, the processes Pl, P2 are holding 


mecerollowing resources: 
Pape tRPC-DC-Al, ICAAM-DC-Al, CSRA/CCB1, DAC-Al, CIC-Al1} 
P2-) {RPC-DC-A2Z2, ICAAM-DC-AZ2, CSRA/CCA2, DAC-A2, CIC-A2} 


At this point of execution, each process requests the DAC 
located in the other distributed controller. But the two 
DACs are held by the requesting processes and are unavailable. 
It seems that we have a deadly embrace situation (deadlock). 
The DAC is designed to avoid such a case. One 
of the DAC (which will be called the first DAC depending upon 
Phe time of arrival of the requests) will suspend the lock 
mmune Second DAC, by releasing some of the resources that 
are held by the second requesting process. This way the 
first requesting process will be advanced while the second 
Will be suspended and wait for the first process to terminate. 


This deadlock could happen if the suspend lock method is not 
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used when the two requesting clusters are located in differ- 
ent stars because the two spinning locks of the two central 
wmerollers are not synchronized. Therefore, the spinning 
Meekerunetion is limited for inter-star communication. This 
is the reason for having two types of deadlock avoidance 
methods. The suspend lock method is used to prevent dead- 
lock for inter-star communication. The issue of synchronizing 
elewspinning locks of the different central controllers of a 
multi-star system 1S not desirable for fault tolerance, and 
sometimes it may not be possible to synchronize them. 

The second method of deadlock avoidance is the 
"spinning lock'' method. This method is used to prevent 
deadlocks which may occur in inter-cluster or intra-cluster 
Communication within the same star. If for any reason this 
method fails to prevent a deadlock, the "suspend lock" method 
will take over and prevent the deadlock. The reason for 
using two different methods is to reduce the overhead created 
by the suspend method and to increase fault tolerance. 

cc einethe centnal Controller 1s a four-phase 
amied-CcOilncidence clock as shown in Fig. 3.22. This clock is 
eae “Spinning lock" generator. 


e. Example #5 - Deadlock Avoidance II - 
SpiIniitagwLock “(kag . Sal2) 


Let us assume that SBC-1i in cluster A requests 
5>BC-j} in cluster B and SBC-k in cluster B requests SBC-2 in 
muster A. These requests are all for SBCs residing in the 


Same "'star.'' If the two requests are sent simultaneously to 
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maemoeaka Of CCA and CSRA of CCB, respectively, of the central 
@omeroiler, they eyentually will progress to the deadlock 
condition as explained in Example #4. In order to prevent 
Emenmepossibility, the CSRA of the central controller is 
designed with two "lock in request" phases. 


1) The first phase is implemented by the rotating 
eurority arbiter. 


2) The request selected by the first arbiter propagates 


eoethe “spinning lock" circuit Which will lock on 
the request only when CLK-2 goes low. 


CLK has four phases. Since only one goes low at any given 
time, it 1S impossible for both requests to leave the central 
controller at the same time to the distributed controller of 
the requested cluster and thus eliminates the race condition 
and deadlock. A race condition occurs when the scheduling 
of two processes is so critical that the various orders of 
scheduling them result in different processing [192]. The 
Minimum time difference caused by the spinning lock to the 
meawesting process is equal to the anti-coincidence time t., 
meme k-2 (Fig. 3.22). 
6. Multibus Communication 

Two arbitration circuits are used in the Multibus 
Communication: the on-board SBC arbiter called Bus Arbiter 
and the RPC of the distributed controller. 

The Bus Arbiter provides several resolving techniques 
based on a priority concept that at a given time one SBC will 


have priority above all the rest. The RPC can be regarded as 


Ws 





eeparallel priorityeresolver. A parallel priority resolving 
technique has a separate bus request BREQ line for each arb- 
iter on the system bus (Multibus). Several BREQ lines enter 
to the RPC input. For each BREQ line, there is a correspond- 
ing BPRN (bus priority in) line at the output of the RPC. 

Only one BPRN signal can be activated at any given time. 
meroesignal BPRN is returned to the highest priority request- 
m@embus arbiter. The bus arbiter receiving priority (BPRN 
active low) then allows its associated SBC onto the multi- 
master system bus, as soon as the bus becomes available (1i1.e., 
it is no longer busy). When one bus arbiter gains priority 
Over another arbiter, it cannot immediately seize the bus. 

[It must wait until the present bus occupant completes its 
transter cycle. Upon completing its transfer cycle, the 
present bus occupant recognizes that it no longer has priority 
(BPRN goes high) and surrenders the bus, releasing the Busy 
Signal. Busy is an "active low" signal line which goes to 
every bus arbiter on the system bus and is tied with other busy 
Signals by a "OR' gate. When the "Busy" goes high, the 
arbiter which presently has bus priority (BPRN active low) 
then seizes the bus and pulls "Busy" low to keep other arb- 
meets Ort the bus. (See waveform timing diagram, Fig. 3.13.) 
Note that all multi-master system bus transactions are syn- 
em~onized to the bus clock (BCLK). This gives to the parallel 
priority resolving circuit time to settle and make a correct 
decision. Fig. 3.14 depicts the interconnections between the 


bus arbiters and the RPC. 
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Fig. 3.13 Timing Diagram of Bus Arbiter and Random Priority 
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Imm OUT COntipumration, every master currently using 
the bus will surrender the bus upon completing its transfer 
cycle (unless a bus lock is executed). This property is 
accomplished by tying all CBREQ (common bus request) lines o 
of ali bus arbiters to ground. CBREQ is an active low signal 
Mimen andicates to the current master on the bus that the bus 
has been requested by another master. 

mveomonter Smanals, GOCK andsCROLCK, lend to the flex- 
ibility of the bus arbiter within the system configuration. 
LOCK is a signal generated by the processor to prevent the 
bus arbiter from surrendering the multi-master system bus to 
fmweatmner master, either higher or lower priority. CRQLCK 
(common request lock) serves to prevent the bus arbiter from 
surrendering the bus to a lower priority bus master when con- 
ditions warrant it. LOCK is used for implementing software 
semaphores for critical code section and real time critical 
events (such as memory refresh or hard disc transfer). 

In the three different types of communications we 
referred to the term PRN and REQ chains. The following state 


diagrams depict those chains: 


1) Intra-cluster communications 


BREQ 


BPRN 


lez 





2) Inter-cluster communications 





3) Inter-star communication 





BPRN - STPRN* 


Dee RESENTATION OF RESULTS 
mee introduction 
The important hardware components developed in this 


thesis to support this multiple microcomputer system are the 


following: 
Interconnection: 
Intra-cluster -- Multibus 
Inter-cluster -- Complete-Star Bus Switch Network 


Intercommunication Control (three levels): 
Random-Priority Controller 
Distributed Controller 


Central Controller 


In this section, we will present representative test 


results to answer two major questions. 





Wee Dad our desioen work? 


2) How well did it work? 


Since the Multibus is developed by Intel and is well docu- 
mented [196], we decided not to report its operations here. 
We will describe the operational results of the bus switch 
and the three levels of intercommunication control. 

How well they work together in a computational 
environment will be reported in Chapter IV where the imple- 
mentation of an adaptive spatial filter on the multiple 
microcomputer system will be described. 

2. Bus Switches 

The function of a bus switch is to transmit a signal 
from the Multibus in one cluster to the Multibus in another 
Cluster. For four clusters, the “complete star bus switch 
network" designed has six branches of bus switches as shown 
fijete. 35./. Although the Intel's Multibus has 86 lines, 
we decided that only 58 of them need to be switched to 
facilitate communication between two SBCs from different 
clusters. Therefore, one "bus switch" includes appropriate 
circuits to transmit 58 signals, including data, address and 
control signals. 

Four figures will be used to describe the behavior 
of the bus switch. The first three figures are used to show 
the improvement of signal waveform before and after the bus 


Switch. The signals shown are the following: 





One data bit - Eesha 
One address bit - ieee Ss. 1 SD 
Onemeontrol signal = Fig. 3.15c 


Each figure consists of two traces. The top trace shows the 
waveform before the switch. The lower trace shows the wave- 
form after the switch. It can be seen that in all three 
cases the waveforms after the switch are better because their 
rise times are all shorter, giving a sharper pulse. It is 
interesting to note the noise appearing on these three signals. 
Pheyeare typical in the real operational environment. It 
SIoumemoe NOted that the Control signal in Fig. 3.15c is the 
Acknowledge Signal (XACK) generated by the SBC requesting the 
use of the system bus. 

The behavicr of the bus switch is described also by 
Fig. 35.20 which shows the delay of the switch. Again, the 
top trace is before the switch, the bottom trace is after the 
switch. The delay is no_-more than 25 nsec. 

These four figures demonstrated that our bus switches 
are adequate to provide communication between two Multibuses 
running at 10 MHZ. 

Seeenancom Priority Controllers (RPC) 
iieainerlOnmonmandoniapmlority controllers is to 
op ieeate the requests Of bus usage from many SBCs, either 
faomeche same cluster or from several clusters. If an SBC 
from another cluster wants the Multibus to communicate either 
with another SBC or with the Global RAM, two higher level 


controllers - the central controller and two distributed 
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INA data Bit 


Fig esee5a 


An Address bit 


Fi0 sO 


A control signal 
| "Acknowledge" (XACK) 


Rugeeesie Sc 


Figure 3.15 The input and output waveforms of three 
selected signals to demonstrate the 
performance of bus switch 


Top trace: Input to the bus switch 
Bottom trace: Output of the bus switch 
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controllers associated with this cluster and the other clus- 
ter where the requesting SBC resides - must also participate 
PecmemeattrOlLerunction. However, the control ultimately 
Samemecmtne RPC because It 15 the circuit which grants the 
bus usage signal, BPRN (Bus Priority In). One RPC is used 
ieevery Multipus. SO there are four RPCs in each star. 

The behavior of our RPC will be described by four 
faeures using the BPRN signals (Bus Priority In) of the SBCs 
requesting the bus. A BPRN low signal means the SBC has been 
granted the bus and is uSing it. 

ayeonearing Of: the Multibus by Two SBCs. 

Fig. 3.16 shows BPRNs of two SBCs. The bus usage 
pattern was created by software. Each unit of low BPRN rep- 
resents a transfer of one word. If there is no request of 
memisace by Other SBCs, the SBC currently using the bus will 
hold, as shown by the BPRN low signal for a longer period of 
time. The figure shows the interleaving of bus usages by 
these two SBCs, indicating that the RPC works rapidly and 
efficiently to serve these two SBCs. 


b. Slow-Down of Bus Release Due to Refresh 
of Dynamic RAM 


However, we discovered that the SBC using the 
busemay met release the bus after its one word of transfer, 
as shown by a wide gap in Fig. 3.17, although the other SBC 
was requesting the bus. We discovered that this is the na- 


ture of Intel's 8612 design. When the dynamic RAM is being 
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Figure 3,16 Bus Priority In signals of two SBCs to demonstrate 
the arbitration of their usage of the bus by the 
random priority controller 
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Figure 3.17 Bus Priority In signals of two SBCs to demonstrate 
the effect of dynamic RAM refresh on the bus usage 
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Pueune 5-13 Bus Priority In signals of four SBCs to demonstrate 
the arbitration of their usage of the bus by the 
random priority controller 
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refreshed, the SBC will not release the bus. This is a 
drawback we cannot do anything about except to redesign the 
Soleo BC. 
ee olaGinewot Multiapus by Four SBCs 

ieee eiesnOWwomelomEERN Signals of four SBCs. 
Their general patterns are similar, in the sense that there 
tsmeomrarge gap in any one of these traces indicating no SBC 
is dominating the bus and none is being left out either. 
This "uniform" and "equal'' treatment of all SBCs requesting 
the bus is exactly what the RPC is designed to do. 

tieebechavien of RPC When the Bus 1s Saturated 

We prepared the most severe test for the RPC by 
programming four SBCs requesting the bus all the time. Of 
aemse. In real applications, this condition should never be 
allowed to happen. It represents very poor application pro- 
Meter HOWeVer, 1€ 1S a tough test for the RPC. Fig. 3.19 
shows the BPRN of four SBCs. The interleaving of bus usage 
Pemornierent trom the previous three figures. However, 
it 1s important to note that the bus was first shared by SBCl 
and SBC3 for 12 transfers and then shared by SBC2 and SBC4 
Pormemetner 2 transfers, followed by the repetition of such 
a pattern. Two important properties caused this pattern. 
mimot, the RPC is designed based on a binary tree selection. 
Miteretore, Only two SBCs will be granted first, followed by 
another pair. Second, the 12 transfers between SBCl and SBC3 


are determined by the basic design of the 8686 instruction 
queue which has a FIFO queue of six instructions. 
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Pree WseerLotity Ine silenals of four SBCs which 
request the bus usage 100% of the time to 
demonstrate the function of random priority 
controller 


Input signal to a bus 
Switch 


Output signal waveform 
from a bus switch 





Figure 3.20 Waveforms of input and output signal of a 
bus switch to demonstrate the operation 
of the switch 
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Rove oe Ze bus) Priority in signal of four micro- 
computers requesting 20% usage of the 
Multibus to demonstrate the operation 
Of the random priority controller in 
this example of heavy bus requests 
(80% bus request) 


go 








titsmdenionseration clearly indicated that our 
RPC is able to arbitrate four SBCs under the most demanding 
bus contention situation which should never be allowed to 
teeieein real application. 
4. Central Controller 

The function of the central controller is to arbitrate 
requests for inter-cluster and inter-star communication. It 
works jointly with the distributed controllers to search, 
select and synchronize these requests. Although there is only 
one central controller for a star, it has four sections, one 
for each cluster in the star. 

The important components of each section in the 
central controller are CSRA and CSPE. All four sections are 
synchronized by two clocks: CLK1 for the searching and se- 
lecting of requests, CLK2 for their synchronization. 


Two figures will be used to demonstrate their oper- 
ations. 


a. Searching/Selecting Clock (CLK1) and 
synchronization Clock (CLK2) 


These two clocks are the heart beats of the inter- 
communication network. It should be realized that CLK2 is 
not independent because it is generated from CLKI1. Fig. 3.22 
Snows their mutual relationship. The third trace is CLKl. 
Bemow it are the four-phase CLK2 signals for four clusters. 
It iS important to note that there is no overlap among them. 
This is to avoid any undesirable coincidence. CLKl is at a 


higher clock frequency such that all requests from other 
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clusters and stars are searched and selected at adequate 
Fatese Once a request is selected, it is synchronized by 
Cizeana Sent on to the appropriate cluster. 
b. Searching and Selection of Requests 

Fig. 3.23 shows the functions of CSRA and CSPE 
circuits of the central controller A. Four signals are shown 
mameene top half of the figure representing three cluster 
requests from clusters B, C, D and from the cluster A of 
another star, respectively. The lower half of this figure 
shows the cluster or star grant signals to another star, 
mmrseer D, C and B, respectively. It 1S important to note 
that these CLPRN (or STPRN) signals do not overlap although 
the request signals do overlap. It can be seen that cluster 
Geeereeites CLREQ tirst and got its CLPRN. However, cluster 
Ueeatentcs CLREQ before cluster C finishes its request. Such 
an occasion is generally not allowed in real application 
because any SBC is allowed to transfer one word of data and 
must release the bus only if a software bus lock is ordered. 
However, this test is to challenge the ability of the central 
controller. In this case, the CSRA/CSPE of the CCA will allow 
the cluster A to complete its request monica and then award 
GmeboRN tO Cluster D. This figure clearly demonstrated that 
with a mix of cluster request signals from three clusters and 
one star, some with overlap, some without overlap, the central 
controller is able to take in these requests, sort them out, 


select one at a time and award "cluster grant'' appropriately. 


225 





GiukKi: For Searching 
and Selection 


| CLK2: 4 Phase Clock 
For Synchronization 





Figure 3.22 Two Clocks In Central Controller For 
Searching/Selection and Synchronization 
of Requests From Stars and Clusters 


. Four Request Signals To CSRA: 


From DCB 
From DCC 
From DCD 
From Star A 


Four Priority In Signals 
mROmMeGo rE: 


To Star A 
To DCD 
To DCC 
To DCB 





Figure 3.23 Demonstration of the Functions of CSRA and 
CSPEaCirecules inesthe Central Controller 
(Section A for Cluster A) 
imput, to Coke, Output from CSPE 





Ommeourse, this 1s not the completion of the intercommunica- 
tion task. The CLPRN will be sent to the distributed con- 
troller to initiate further control actions to complete the 
total task of communication between two SBCs. 
Peeeebistributed Controller 
The function of the distributed controller is the 
same as that of the central controller. They must work with 
the RPC to complete the intercommunication. The central 
controller is located away from the Multibus and also controls 
miemeperatlons Of all bus switches. The distributed control- 
ler is mounted on the Multibus. Therefore, we have four 
fewer buted controllers in a star. The important components 
@emeach distributed controller are: 
ICAAM (Intra-cluster advanced activities monitor) 
lee, coincidence inhibitor circuit) 
DAC (Deadlock avoidance circuit) 
Four figures will be used to demonstrate their operations. 
Edeht control signals in the distributed controller are used 
mameeiese L1igures. 
BREQ 
CLREQ* 
Internal/External Signal 
ana 1 t 
PRE 
BHD 
CLPRN 
BPRN 
The first and eighth control signals, BREQ and BPRN, 


are two of the most important ones because they are directly 


connected to the SBCs. We must remember that all the buses, 
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ecieame cOMurolLlers are SUpporting circuits to help the 
SBCs to compute, to talk among themselves efficiently. The 
SBCs are the originators and receivers of the data and com- 
munication and control signals. 
a. Intra-Cluster Communication 

Fig. 3.24 shows the sequence of events in a test 
case where one SBC in a cluster wants to talk to another SBC 
imimeme same cluster. 

it can be seen that CLREQ* (second trace) is high, 
which means no request from another cluster. CLPRN (7th 
trace) is therefore also high, i.e., no cluster priority 
Signal is granted by the central controller. 

It is interesting to notice the small delays 
between BREQ, PRE and BPRN. 

b. Inter-Cluster/Intra-Star Communication 

Fig. 3.25 shows the sequence of events in a test 
case where an SBC in one cluster wants to talk to an SBC in 
another cluster within the same star. 

There are several interesting points when this 
case iS compared with the intra-cluster case: 

° Both BREQ and CLREQ* exist. 


° Inhibit signal is active to prevent any premature 
generation of BPRN. 


WCURMiIMiISealsO sactaye to respond to the CLREQ*. 
Peiomeleodn waco eommunaer bids einter-cluster 
communication has been correctly handled by the distributed 
eonctcroller. 


ZG 





BREQ 
CLREQ 
INT/EXT 
INH 


BE 
BHD 
CLPRN 
BPRN 





Figure 3.24 Eight Control Signals to Demonstrate 
The Function of Distributed Controller 
For Arbitration of Intra-Star and 
Intra-Cluster Communication 


BREQ 
CLREQ 
INT/EXT 
INH 


PRE 
BHD 
CLPRN 
BPRN 





Figure 3.25 Eight Control Signals to Demonstrate 
' The Function of Distributed Controller 
For Arbitration of Intra-Star and 
Inter-Cluster Communication 





BREQ 
STREQ* 
INT/EXT 
INH 


PR 
BHD 
STPRN 
BPRN 


Figure 5.26 Eight Control Signals to Demonstrate 
The Function of Distributed Controller 
For Arbitration of Inter-Star Commu- 
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ee iuctiotah COMMUNI Cation 

Pitino comonowomtne sequence Of events in a 
“test case where an SBC in one cluster of a star wants to talk 
to an SBC in the corresponding cluster of a neighboring star. 
They are quite similar to the inter-cluster/intra-star case 
mire. 5.25 with several changes. 


The second trace is now the STREQ* instead of the 
S@eeeQ* signal. 


The seventh trace is now the STPRN signal instead 
et the CLPRN signal. 


The rest of the signals behave quite similarly. It shows 
that requests from a cluster in the same star and from a 


neighboring star are treated quite the same. 
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IV. IMPLEMENTATION OF ADAPTIVE FILTER 
ON MULTIPLE MICROCOMPUTER SYSTEM 


A. INTRODUCTION 
ime oelection of Microcomputer 

The goal of this thesis research was to eliminate 
the gap between the theoretical development of image process- 
ing algorithms and the experimental development of their 
implementation on some processor systems which are good can- 
didates for practical applications. 

In this thesis, a multiple microcomputer system was 
chosen as the processor system candidate. 

It should be recognized that only during the past 
two to three years have 16 bit microcomputers been seriously 
considered for signal processing implementations. Although 
8 bit microcomputers have been investigated for performing 
Signal processing operations, the motivations of these stud- 
ies are mainly to explore what can the 8 bit microcomputers 
do for signal processing. For serious implementations, bit 
slice microprocessors have always been the favored approach 
which can be designed to emulate 16 bit, 32 bit or even 
longer word computers. However, 16 bit microcomputers are 
being supported with more and more powerful hardware and 
software and are approaching low-end minicomputer performance. 

To examine the signal processing performance of 


today's 16 bit MOS microcomputer, we coded the statistical 
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3x3 spatial filter on one main frame computer, IBM 360/67 
paamewo 16 Dit microcomputers, DEC LSI-11 and Intel 8612, 
using high order programming languages and single precision 
numerical data format. Fortran is used for the IBM and DEC 
computers. PLM86 is used for the Intel computer. The exe- 
cution times expressed in seconds are shown in Table IV.1 
for comparison. 


Abbe. 1Va1 


IMAGE PROCESSING EXECUTION TIME 
(in seconds) 


IBM 360/67 }DEC LSI-11 Intel 8612 
Image Processing Operations PLM86 Macro 


Incege 
ps sa ini | ore [| 
finches fea ow | om 


It can be seen that LSI-1l has better floating point compu- 





tation support today than Intel's 8612 which took 13 to 14 times 
longer than tne LSI-il to perform these image processing oper- 
memons.. The LSI-1ll itself took approximately 6 times longer than 
the IBM 360/67. .- Based on this comparison, the LSI-dl1 should 
Mmenosen as the 16 bit microcomputer candidate. However, 
Intel's 8612 was selected because of its larger physical 

memory addressing space and its system Multibus support which 
are much better suited for multiple microcomputer system 


development. 





Further, two of the three spatial filter modules were 
coded in assembly language and a 32 bit integer data format 
on the 8612. It was found that the execution times are quite 
short, suggesting that even today's Intel 16 bit microcomputer, 
Without the assistance of hardware arithmatic devices, can 
perform these rather sophisticated image processing operations 
very well if compared with the main frame computer IBM 360/67. 
More specifically, it took 0.72 seconds to compute the auto- 
correlation matrix elements for the 3 x 3 spatial filter, 
averaged over the 32 x 32 image, and 0.47 seconds to perform 
meses XxX 5 Spatial filtering over the image. 

Zoe Lp lementation 

In this chapter we will present the implementation 
results of our adaptive filter on the multiple microcomputer 
system. In Section B, the performance of spatial filters is 
discussed. In Section C, the performance of adaptive spatial 


filters will be discussed. 


The functions of various components of the intercon- 
nections and communication controllers have been described in 
previous sections using mainly signals generated by function 
generators. In this section, a test program was used to test 
and evaluate the data transfer behaviors of the system. This 
program is quite straightforward and fetches data from the RAM 
and displays them on a CRT terminal. However, the locations 
of the program and data are at different parts of the system 
to provide a thorough test of the data transfer and bus 


arbitration behaviors. 
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ieee seoEsn were Made. 

The objectives of the first two tests are to measure 
the maximum rate of data transfer on the system bus. For 
this purpose, both the program and data were stored either 
in the global RAM located in another slave SBC, as in test 
case 1, or in the global RAM located in the »uPRO RAM board. 
Therefore, the system bus was used very busily because not 
only the data must be fetched via the bus, the program itself 


must be read from the memory external to the testing SBC. 


WABIEE LV 92 
MEMORY ALLOCATION FOR MULTIBUS TEST 


Kocaturoneo: | Logation of 
os wel 


Slave SBC Slave SBC Program and data 
being run at maxi- 
uPRO RAM uPRO RAM mum rate. 


Master SBC uPRO RAM Program and data 
being run at approx- 
imately 20% of the 
maximum rate. 





The maximum rates at which this test can run with 
one to six microcomputers are shown in Table IV.3. Several 
important facts can be noticed. 

(1) The bus transfer rate of each SBC is reduced 
when more and more SBCs want to use the bus, as it should be. 

(2) However, the maximum rate and amount of reduc- 
meron Vary £rom test to test. For example, in test 1, we 
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were able to transfer 710 Kbyte/sec at its maximum if only 
one SBC is using the bus as compared with a maximum of 911 
Meee, sec Mate for one SBC in test case Z. Test 2 showed 
that it is quicker to get data out of the uwPRO than the RAM 
on a different SBC. This can be explained easily because 
control on the SBC must decide whether the memory addressed 
is on-board or off-board. This decision takes time, thus it 
Slows down the transfer rate. When more SBCs were added in 
these two tests, the transfer rate of every SBC was decreased. 
However, the rates of decrease were different in Test 1 and 
ieee as Shown in Table IV.3. They are also plotted in Fig. 
4.1 to give a graphical view. It is obvious that substantial 
deteriorations of the bus transfer rate took place in these 
two cases, from 710 Kbyte/sec to 144 Kbyte/sec in Test 1 and 
from 911 to 167.1 Kbyte/sec in Test 2. 

(3) It should be pointed out that such heavy 
usage of the system bus should be allowed to happen only 
G@uring tests. If a programmer prepared an application pro- 
gram with such heavy bus usage, he has failed miserably in 
Partitioning his program for parallel and pipeline computa- 
tion in the multiple microcomputer system. 

(4) Therefore, to provide a test more compat- 
ible with real operational conditions, Test 3 was prepared 
which has its program in the RAM of the master SBC and its 
data in the global RAM in uPRO. Further, it was run at a rate 


of 194.9 Kbyte/sec on the bus when only one SBC requested 
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Meembus. lt can be seen that the deterioration of the system 
bus transfer rate 1s much more moderate, from 194.9 for one 
eeeeto, 152 Kbyte/sec for six SBCs. This is a testimony of 
Eiewability of the intercommunication controller in treating 
all SBCs equally without allowing any one SBC to dominate the 


bus usage. 


ree lie, AVS 


SYSTEM BUS TRANSFER RATE (Kbyte/sec) FOR EVERY SBC IN 
(ieee Ombi wemeiacRhOCOMPUTER SYSTEM TESTS 


il 








(5) Further, the overhead loss of transfer rate 
in arbitrating the bus usage of several microcomputers is 
small. Let us consider Test Case 2. The maximum bus trans- 
fer rate took place when there were two SBCs using the bus 
mamas 2 X 922 = 1044 Kbyte/sec. When six SBCs were using 
the bus, the total transfer rate on the bus was 6x 167.1 = 
1002.6 Kbyte/sec. The loss is only (1044 - 1002.6)/1044 = 


0.0397, or 3.97%. Of course, each SBC suffered a loss of 
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met - 167.1)/911 = 8156584 in its bus usage rate. It is 
interesting to note that 167.1 KBS for six SBCs is close to 
Omessixth of the rate of 911 KBS if one SBC has the system 
aeieto itself. 
B. IMPLEMENTATION OF 3x3 SPATIAL FILTERING ON 

MULTIPLE MICROCOMPUTER SYSTEM 

ii intpmocuct ion 

Four different implementations were compared. 

They differed in the manner of storing the programs, variables 
and data in various parts of the memory hierarchy and some 
programming skills. For this development, all program and 
data were stored in RAM on the single board microcomputers. 
These RAM have been separated into two types: 


° Unshared RAM: They are "private" to the microcomputer 
where the RAM is located. 


° Shared RAM: They are "global" and can be accessed 
by other microcomputers on the same Multibus. 
TABLE IV.4 
PROGRAM DATA AND VARIABLE ALLOCATION 
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iitemresults are presented in Fig. 4.2 which expresses the 
number of frames which can be performed on the 3 x 3 spatial 
filtering task per second as a function of the number of 
microcomputers used to partition the spatial filtering into 
parallel operations. It should be pointed out that the image 
mimes 50 x 530 pixels. The partitioning is to split the 
image into equal parts for several microcomputers. 

The results will be discussed in the following. 

ilies inst Case 1S not a measured result. It 
represents the ideal enhancement of computation by using 
multiple microcomputers. We first measured the execution 
speed of performing a spatial filter over the whole image 
by one microcomputer with program, variables and data all 
in the private unshared RAM of the SBC. There was no bus 
usage, therefore no overhead due to bus communication. The 
maximum filtering speed is roughly two thousand pixels pro- 
cessed by this spatial filter per second. For more SBCs, 
we simply multiply the rate by the number of microcomputers 
and plotted a "linear enhancement" curve. This represents 
the ideal case and serves as the goal for our partitioning 
to approach. 

b. Let us start with the case of lowest performance, 
Case 5. In this case, all program, variables and data were 
located in the shared memory of another SBC. It obviously 
required the maximum amount of transfer and system bus usage. 


It can be seen that the performance saturated quite quickly. 
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We are obviously wasting the computational power of added 
microcomputers. 

c. Next, in Case 4, where the program was stored 
in the private memory of the computing SBC, but the variables 
and data were stored in the global memory of another SBC. 

The throughput performance improved almost linearly with 
respect to the number of microcomputers but at a rate lower 
Eaametne “ideal linear enhancement." 

d. In Case 3, both the program and variables were 
Seored in the unshared private RAM. But the data were stored 
in the global RAM of another SBC. Further improvement was 
accomplished. However, about 20% of the computing capability 
Wasel0St because of the overhead for the arbitration of mul- 
iople microcomputer requests. 

Ciimease sc ele locations of the program, varia- 
bles and data are the same as in Case 3, but the programming 
1S more clever in the sense that the number of accesses to 
the system bus by each microcomputer is minimized and, further, 
the occurrences of these system bus accesses were distributed 
evenly am time as possible. it can be seen that the en- 
hancement of total computing power is much closer to the total 
"ideal linear enhancement" case. 

f. In summary, we have used the special case of spa- 
Meal filtering to explore the behavior and improvement of 
computing by the multiple microcomputer system. [It should 


be pointed out that although there have been a lot of ideas 


259 





fuels £ield, real e€xperience is still very limited. Con- 
seamenciy, there 1s really no concensus in the philosophy, 
approaches and methodologies of effective partitioning for 
parallel and pipeline computing. This thesis is a first step 
mameesting the ineharted water. We only used a spatial filter 
to test the parallel processing. We have not used a problem 
to test pipeline processing and combined parallel/pipeline 
processing yet. Therefore, we do not intend to declare that 
the experience learned from this spatial filtering established 
a general methodology for effective partitioning. 

But we feel that the following guidelines proba- 
bly will be helpful when more complex problems will be tested 
to develop a more thorough philosophy of partitioning: 

a) The bus usage should be minimized. 

b) The bus usage should be distributed more evenly 
in time. Concentration of bus usage should be 
avoided. 

g. Meanwhile, it should be pointed out that this 
implementation of spatial filtering is a test case based on 
a real computation problem. In addition to the experience 
learned for partitioning, the successful implementation of 
the spatial filtering involving up to five microcomputers in 
parallel processing convincingly proved that the random 


priority is working correctly, 
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VY. CONCLUSION AND RECOMMENDATIONS 


A. CONCLUSION 
1. Motivation 

This thesis was motivated by the needs of new smart 
sensor developments. With the anticipation of new sensitive 
and large mosaic optical sensor arrays and very sophisticated 
Signal/data processing capabilities to be offered by VLSI/ 
VHSIC electronics, very ambitious mission objectives of new 
surveillance, search/track and weapon guidance systems are 
being proposed and developed, which require new signal pro- 
cessing techniques to accomplish demanding goals. Further, 
they require very sophisticated processor systems which are 
powerful enough to implement the new signal processing 
algorithms and also small and light enough for mounting on 
platforms of practical systems. 

Eeeoanele Obj eative and Dual Tasks 

This thesis has one single objective, to help to 
make the new "smart sensors" practical, but consists of two 
tasks to achieve this objective. 

muebecvcrom new adaptive filter techniques to process 
infrared images for enhancement of "target signal" 
to "background clutter noise" ratio. 


b. Develop a new multiple microcomputer system to 
implement this type of image processing. 
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Sa Extensions and Contributions 

Both studies, although motivated by the development 
of "infrared smart sensors," are generic and can contribute 
to broader fields much beyond the image processing problems 
in infrared smart sensor systems. 

mee Results I - Adaptive Filters 

The following results have been obtained: 

a. Adaptive filter research done in the past was 
Surveyed. It was found that: 

° Practically all past research dealt with one dimen- 
sional problems, except one by B. Evenor who extended 
the LMS algorithm to images generated by Markov models. 

° Most approaches are based on LMS algorithms. 

b. In this thesis the LMS algorithm was extended to 
process real world infrared images. 

c. A new approach to nonrecursive adaptive filters 
was developed which is similar to searching for the extreme 
point in optimization problems. 

d. Two optimization criteria were considered: 


mMSE = minimization of mean square error 
MSNR = maximization of Signal to noise ratio. 
e. Seven different optimization/searching techniques 
meme developed: 
° Gradient approaches = Steepest descent 
AGecelenrawedamstecepest descent 


Amir's method (mMSE only) 


° Conjugate gradient approaches = Fletcher-Reeves 
Pollack 


° Variable metric approach - Davidon-Fletcher-Powell 
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oO 


Amir's transform approach (MSNR only) 


f. These approaches were tested on two infrared test 


images: 


© 


Indiana - Blue spike band infrared image appropriate 
for high altitude downward looking infrared sensor 
systems. 


China Lake - 10-13 micron thermal band infrared image 
appropriate for shorter distance side-looking infrared 
sensor systems. 


The results are encouraging and showed that these new 


Semtave tilters are effective in suppressing background clutter 


ameeenhancing the ''target signal" to "clutter noise ratio." 


Seeenesults Ii - Multiple Microcomputer System 


dueetne tachtly-coupled multiple microcomputer research 


done in the past was surveyed. It was found that: 


oO 


oO 


There are many conceptual designs of new multiple 
microcomputer systems. Only a very small number of 
these have embarked on actual developments with both 
hardware and software efforts. 


More loosely coupled multiple microcomputer systems 
are being developed. They are mostly computer networks. 


There are only two tightly coupled multiple micro- 
computer systems in operation today based on the 
Eemvey Or the open literature. Both are at Carnegie 
Mellon University: Cmmp and Cm*. It should be noted 
that although Cmmp is a multiple minicomputer system, 
today's 16 bit microcomputers are fast approaching 
minicomputer performance. 


b. Based on an intensive consideration of the re- 


quirements of typical new smart sensor systems in not only 


the mission signal processing area but also in management, 


control, and communication areas, it was decided that a 


hierarchical architecture which supports simultaneous tightly 


and loosely coupled systems is attractive. 
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c. A multiple star, multiple cluster architecture 
using commercially developed 16 bit microcomputers was 
developed. A complete star bus switch network was developed 
which is managed by a control system consisting of three 
levels of control: random priority controller, distributed 
Senmturoller, central controller. 

d. The basic concept of this hardware architecture 
has been basically tested by simulated intercommunications. 
Extensive tests in real signal/data processing environments 


are awaiting the successful developments of operating systems. 


See kesuucsiiiv-> Implementation of Adaptive Spatial 
Filters on Microcomputers and Multiple 
Microcomputer Systems 


a. The spatial filter program was coded for one 
main frame, the IBM 360-67, and two 16 bit microcomputers: 
@gem0eC Lol-ll and one Intel 8612. The DEC LSI-11 has more 
mature floating point mathematics software and a hardware 
arithmetic IC chip, but is not as well suited for multiple 
microcomputer system development as the Intel 8612, whose 
floating point software is still very primitive. However, 
when coded in assembly language, the Intel 8612 performs 
the spatial filtering faster than the main frame coded in 
high order language. 

PPeeerionencttea= Dy Ustie Only one LO Dit 8612 micro- 


Eemputer, the computation times for the 3 x 3 spatial filter 
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and a 32x 32 image have been measured as follows: 
Spatial statistics computation = 0.7/2 sec. 


Adaptive spatial filter design = 1.0 sec. 
(Conjugate gradient Pollack method) 


Perform spatial filtering = 0.47 sec. 


c. Several ways of using the multiple microcomputer 
implementation by placing program, variables and data in the 
unshared private RAM and/or the shared global RAM have been 
investigated. 

It was found that the best enhancement of total 
execution speed of the spatial filtering is to use more micro- 
computers by storing the program and variables in the private 
RAM and the data in the global RAM. The image data is not 
moved into the microcomputer all at once.. Instead, the data 
is moved, one at a time, into the private RAM of the micro- 


computer only moments before it is needed for processing. 


B. RECOMMENDATION 
1. General 

Both topics covered in this thesis are quite new. 
This research only opens the gate a little into two fields 
worthy of more investigations. Although this thesis is con- 
cerned mainly with the image processing developments and 
their implementations for infrared smart sensors, the tech- 
niques developed are generic and can be applied to much 


broader fields beyond smart sensors. 
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meno dpetvewmr. ters 


The new techniques based on the concepts of gradient, 
Optimization search can be applied to most of the adaptive 
filter research done in the past using the LMS algorithm. 

For adaptive image processing applications, they 
should be used to develop adaptive temporal filters if a 
series of successive frames of images are rather well regis- 
tered spatially from frame to frame, although there may be 
driit, jitter, rotations, etc. between frames. | 

Testing of these adaptive filters using more challeng- 
ing real world images which have serious non-stationarity 
Should be performed to give the adaptive filtering techniques 
some tough challenges. Jamming and interference noises should 
be considered. The convergence time of the compiled adaptive 
filter programs should be measured to obtain relative speed 
of convergence of all the adaptation methods. Adaptive fil- 
mers £Or extended targets should be developed. 

eeu ciple Microcomputer System 

Although the subject of multiple microcomputer systems 
is not new, there are many unresolved questions that have 
hardly been touched because of the extensive effort required 
to make any type of multiple microcomputer system operational. 
Only two such systems are known to be working today, Cmmp and 
Cm*, although many system architectures have been proposed 
and conceptualized. A small number of these have been simu- 


lated. A smaller number of them are being emulated. An even 
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smaller number of them are being built. Simulations and 
modeling used today for multiple microcomputer systems must 
Pemearetuily and critically scrutinized for their validity 
and usefulness. It is extremely important to examine how 
the intercommunication overhead is modeled and simulated. 
There is very little first-hand experience in existence today. 
Therefore, a wide variety of problems associated with 
the new multiple microcomputer systems must be researched, 
examined and answered. 

This thesis contributed to the formulation, design, fab- 
rication and test of a multiple microcomputer system which 
Same oe used - 

1. Not only for developing effective ways of implementing 
smart sensor image processing, in general, and the adaptive 
image Peocessing, In particular, 

2. But also as a test bed to develop, verify, and improve 
several basic issues of multiple microcomputer systems. I[n- 


cluded were considerations of: 


a. Effective and alternative intercommunication for 
combined tightly and loosely coupled systems. 

b. Effective and alternative operating systems for 
real time signal processing, multi-tasking, multi-users, 
security, dynamic reconfiguration and fault tolerance. 

c. Effective and alternative programming methodologies 
Hoeepartiteioning a given problem into a number of modules suit- 
able for combined pipeline and parallel implementation on 
multiple microcomputer systems. 
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d. Effective and alternative ways of using the dis- 
tributed capabilities of multiple microcomputer systems for 


Baimee tOlLerance, seli-maintenance error recovery. 
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